Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization

0upvotes

By: Chengtao Jian, Kai Yang, Ye Ouyang, Xiaozhou Ye

Direct Preference Optimization (DPO) has emerged as a popular and efficient alternative to reward modeling and reinforcement learning for aligning language models with human preferences. Despite its empirical success, the theoretical properties and intrinsic limitations of DPO remain underexplored. In this work, we first present a comprehensive analysis of DPO's dynamics from a probability evolution perspective. Our analysis reveals that DPO ... more

Artificial IntelligenceJuly 11, 2025 3:45am

Comments (0)
Views (0)

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

0upvotes

By: Sukjun Hwang, Brandon Wang, Albert Gu

Despite incredible progress in language models (LMs) in recent years, largely resulting from moving away from specialized models designed for specific tasks to general models based on powerful architectures (e.g. the Transformer) that learn everything from raw data, pre-processing steps such as tokenization remain a barrier to true end-to-end foundation models. We introduce a collection of new techniques that enable a dynamic chunking mechani... more

Machine LearningJuly 11, 2025 3:33am

Comments (0)
Views (0)

EXPO: Stable Reinforcement Learning with Expressive Policies

0upvotes

By: Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn

We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a unique challenge of stable value maximization. Unlike simpler Gaussian policies commonly used in online RL, expressive policies like diffusion and flow-matching policies are parameterized by a long denoising chain, which hinders stable gradient propag... more

Machine LearningJuly 11, 2025 3:32am

Comments (0)
Views (0)

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

0upvotes

By: Ziyue Li, Yang Li, Tianyi Zhou

Can a pretrained neural network adapt its architecture to different inputs without any finetuning? Do we need all layers for simple tasks, and are they adequate for challenging tasks? We found that the layers of a pretrained large language model (LLM) can be manipulated as separate modules to build a better and even shallower model customized for each test sample. In particular, each layer from the pretrained model can be skipped/pruned or re... more

Machine LearningJuly 11, 2025 3:32am

Comments (0)
Views (0)

Measuring AI Alignment with Human Flourishing

0upvotes

By: Elizabeth Hilliard, Akshaya Jagadeesh, Alex Cook, Steele Billings, Nicholas Skytland, Alicia Llewellyn, Jackson Paull, Nathan Paull, Nolan Kurylo, Keatra Nesbitt, Robert Gruenewald, Anthony Jantzi, Omar Chavez

This paper introduces the Flourishing AI Benchmark (FAI Benchmark), a novel evaluation framework that assesses AI alignment with human flourishing across seven dimensions: Character and Virtue, Close Social Relationships, Happiness and Life Satisfaction, Meaning and Purpose, Mental and Physical Health, Financial and Material Stability, and Faith and Spirituality. Unlike traditional benchmarks that focus on technical capabilities or harm preve... more

Artificial IntelligenceJuly 11, 2025 3:32am

Comments (0)
Views (0)

AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift

0upvotes

By: Eunsu Baek, Keondo Park, Jeonggil Ko, Min-hwan Oh, Taesik Gong, Hyung-Sin Kim

Current AI advances largely rely on scaling neural models and expanding training datasets to achieve generalization and robustness. Despite notable successes, this paradigm incurs significant environmental, economic, and ethical costs, limiting sustainability and equitable access. Inspired by biological sensory systems, where adaptation occurs dynamically at the input (e.g., adjusting pupil size, refocusing vision)--we advocate for adaptive s... more

Artificial IntelligenceJuly 11, 2025 3:31am

Comments (0)
Views (0)

Meek Models Shall Inherit the Earth

0upvotes

By: Hans Gundlach, Jayson Lynch, Neil Thompson

The past decade has seen incredible scaling of AI systems by a few companies, leading to inequality in AI model performance. This paper argues that, contrary to prevailing intuition, the diminishing returns to compute scaling will lead to a convergence of AI model capabilities. In other words, meek models (those with limited computation budget) shall inherit the earth, approaching the performance level of the best models overall. We develop a... more

Artificial IntelligenceJuly 11, 2025 3:31am

Comments (0)
Views (0)

DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models

0upvotes

By: Liang Wang, Yu Rong, Tingyang Xu, Zhenyi Zhong, Zhiyuan Liu, Pengju Wang, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

Molecular structure elucidation from spectra is a foundational problem in chemistry, with profound implications for compound identification, synthesis, and drug development. Traditional methods rely heavily on expert interpretation and lack scalability. Pioneering machine learning methods have introduced retrieval-based strategies, but their reliance on finite libraries limits generalization to novel molecules. Generative models offer a promi... more

Machine LearningJuly 11, 2025 1:26am

Comments (0)
Views (0)

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

0upvotes

By: Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao

Reinforcement Learning (RL) has demonstrated its potential to improve the reasoning ability of Large Language Models (LLMs). One major limitation of most existing Reinforcement Finetuning (RFT) methods is that they are on-policy RL in nature, i.e., data generated during the past learning process is not fully utilized. This inevitably comes at a significant cost of compute and time, posing a stringent bottleneck on continuing economic and effi... more

Machine LearningJuly 10, 2025 8:27pm

Comments (0)
Views (0)

Self-Supervised Learning at the Edge: The Cost of Labeling

0upvotes

By: Roberto Pereira, Fernanda Famá, Asal Rangrazi, Marco Miozzo, Charalampos Kalalas, Paolo Dini

Contrastive learning (CL) has recently emerged as an alternative to traditional supervised machine learning solutions by enabling rich representations from unstructured and unlabeled data. However, CL and, more broadly, self-supervised learning (SSL) methods often demand a large amount of data and computational resources, posing challenges for deployment on resource-constrained edge devices. In this work, we explore the feasibility and effici... more

Machine LearningJuly 10, 2025 6:57pm