Fairness-Aware Federated Learning with Trajectory Shapley Value

0upvotes

By: Daniel Kuznetsov, Ziqi Wang

Federated learning is an emerging distributed paradigm that addresses the challenges posed by heterogeneous, privacy-sensitive data. It enables multiple clients to train a model collaboratively by aggregating their local updates at a server. However, conventional aggregation schemes typically use fixed weights that fail to reflect unequal and time-varying client contributions, leading to biased and unstable learning. To improve fairness and s... more

Machine LearningMay 29, 2026 3:56pm

Comments (0)
Views (30)

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

0upvotes

By: Fanny Lehmann, Firat Ozdemir, Yun Cheng, Torsten Hoefler, Sebastian Schemm, Benedikt Soja, Siddhartha Mishra

While AI weather models excel at short-to-medium range forecasts (up to 15 days), they frequently suffer from ill-defined "instabilities" when rolled out over longer horizons. This work addresses the lack of a formal taxonomy by categorizing these failures into three distinct regimes: blow-up, drift, and loss of seasonality, through year-long rollouts of nine state-of-the-art AI weather models. Our analysis reveals that stability hinges on th... more

Machine LearningMay 29, 2026 9:01am

Comments (0)
Views (29)

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

0upvotes

By: Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan

Reliable probability estimates are critical in many machine learning applications, yet modern classifiers are often poorly calibrated. Post-hoc calibration provides a simple and widely used solution, but the large number of proposed methods, combined with small-scale and inconsistent evaluations, makes it difficult to determine which approaches are truly effective in practice. We introduce a large-scale, standardized benchmark for post-hoc ca... more

Machine LearningMay 29, 2026 6:13am

Comments (0)
Views (25)

Self-Trained Verification for Training- and Test-Time Self-Improvement

0upvotes

By: Chen Henry Wu, Aditi Raghunathan

Self-improvement at scale has been a longstanding goal for reasoning models, and there are two natural places to do it: at test time, through verification-refinement (V-R) loops; and at training time, through self-training methods. Both are gated by the same bottleneck: the verifier. V-R loops stall when verifier scores inflate while accuracy stagnates, and when feedback is too generic to act on; self-training fails similarly when bad self-ge... more

Machine LearningMay 29, 2026 6:12am

Comments (0)
Views (29)

Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents

0upvotes

By: Wenhao Li, Xiangfeng Wang, Bo Jin

Diffusion-based planning has achieved strong results in single-agent offline reinforcement learning, yet scaling to many-agent systems remains intractable due to the curse of dimensionality in the joint trajectory space. We introduce MF-Diffuser, a framework that lifts trajectory planning to the Wasserstein space of trajectory distributions, where the propagation of chaos ensures a small representative subset of agents captures the full popul... more

Machine LearningMay 29, 2026 6:06am

Comments (0)
Views (27)

When, why, and how do diffusion posterior samplers fail? A finite-sample lens

0upvotes

By: Benjamin A. Burns, Sara Fridovich-Keil

Diffusion models have excellent capacity to model complex distributions of natural data, which has made them a popular and effective choice for posterior sampling in imaging inverse problems. Existing methods can incorporate any measurement model at inference time but must use an inexact approximation for the likelihood at intermediate timesteps for computational tractability. Although these approximations can often work well empirically, the... more

Machine LearningMay 29, 2026 6:05am

Comments (0)
Views (29)

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

0upvotes

By: Alaa Khamis, Alaa Maalouf

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen per query, making each a direct bottleneck. Existing methods trade speed for quality: fast retrieval is often redundant, while stronger diversity-aware selection adds... more

Machine LearningMay 29, 2026 6:04am

Comments (0)
Views (28)

Transformers Provably Learn to Internalize Chain-of-Thought

0upvotes

By: Yixiao Huang, Hanlin Zhu, Zixuan Wang, Jiantao Jiao, Stuart Russell, Somayeh Sojoudi, Song Mei

Chain-of-Thought (CoT) prompting substantially improves the sample efficiency of transformers, reducing the complexity of tasks like parity learning from exponential to polynomial in the input length. However, generating explicit reasoning steps at inference is computationally expensive. Implicit Chain-of-Thought (ICoT) has emerged as a promising empirical remedy that trains models to internalize intermediate steps within their hidden states,... more

Machine LearningMay 28, 2026 2:29am

Comments (0)
Views (24)

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

0upvotes

By: Kevin Y. Li, Asher Trockman, Ananda Theertha Suresh, Ziteng Sun

Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory. While these sub-quadratic token mixing methods, or mixers, achieve promising efficiency gains and competitive results ... more

Machine LearningMay 28, 2026 2:29am

Comments (0)
Views (27)

PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

0upvotes

By: Yangyi Huang, Ruotian Peng, Zeju Qiu, Jiale Kang, Yandong Wen, Bernhard Schölkopf, Weiyang Liu

Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be assessed through the stability-plasticity dilemma: the trade-off between target-task adaptation and resistance to forgetting. We introduce PEFT-Arena, a benchmark that jointly measures downstream ... more

Machine LearningMay 28, 2026 2:28am