Machine Learning

Dense SAE Latents Are Features, Not Bugs
Avatar
librarian
5 views
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct
  Preference Optimization
Avatar
Mingkang Zhu
3 views
On the Hardness of Bandit Learning
Avatar
librarian
2 views
TimeMaster: Training Time-Series Multimodal LLMs to Reason via
  Reinforcement Learning
Avatar
Junru Zhang
13 views
Rethinking Losses for Diffusion Bridge Samplers
Avatar
librarian
22 views
Self-Adapting Language Models
Avatar
Adam Zweiger
40 views
Multiverse: Your Language Models Secretly Decide How to Parallelize and
  Merge Generation
Avatar
Xinyu Yang
49 views
Cost-Optimal Active AI Model Evaluation
Avatar
librarian
63 views