Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

0upvotes

By: Jaeyong Ko, Pilsung Kang, Yukyung Lee

Large language models (LLMs) reach high accuracy in mathematical reasoning, but individual traces on the same problem diverge; some arrive at the correct answer while others fail. Prior work analyzes failure at the step, chunk, or sentence level, or at tokens where failure has already occurred. Neither identifies the precise token that triggers the shift toward failure. We introduce the cliff token, a token where the token-wise potential drop... more

Artificial IntelligenceJune 25, 2026 1:58am

Comments (0)
Views (26)

AI Snitches Get Glitches: Towards Evading Agentic Surveillance

0upvotes

By: Hyejun Jeong, Dzung Pham, Amir Houmansadr, Eugene Bagdasarian

To better assist users with completing challenging tasks, AI agents mediate communications, access data, and interact with different APIs. Many employers (and even nation-states) already provide their users with this technology. However, widespread adoption of AI agents creates a new risk to abuse access to user data for another goal: surveilling users. These users might not even have the ability or permission to control the actions and data ... more

Artificial IntelligenceJune 25, 2026 1:41am

Comments (0)
Views (25)

Confidence Sequences for Online Statistical Model Checking of Markov Decision Processes

0upvotes

By: Konstantin Kueffner, Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft

Markov decision processes (MDPs) are a classic model of decision making under uncertainty, exhibiting both non-deterministic choice as well as probabilistic uncertainty. Traditionally, exact knowledge of the underlying probabilities is assumed. However, this often is unrealistic, e.g.\ when modelling cyber-physical systems or biological processes. Here, statistical methods provide a way towards obtaining meaningful guarantees. The classical a... more

Artificial IntelligenceJune 25, 2026 1:36am

Comments (0)
Views (21)

Decentralised AI Training and Inference with BlockTrain

0upvotes

By: Peter Toth

Frontier AI training is increasingly shaped by access to dense, centrally controlled accelerator clusters. This creates a structural advantage for hyperscalers and large centralized laboratories, and makes open or independent AI efforts depend on scarce capital, privileged infrastructure, and data-center geography. We present Spheroid BlockTrain, a decentralized training protocol in which a model is partitioned into independently trainable bl... more

Artificial IntelligenceJune 24, 2026 10:56pm

Comments (0)
Views (27)

World Models in Pieces: Structural Certification for General Agents

0upvotes

By: Yikai Lu, Yifei Wu, Xinyu Lu, Tongxin Li

In the big-world regime, agents cannot be universally capable and their ability is inevitably specialized across a world model in pieces. Consequently, standard uniform guarantees fail to distinguish between the understanding of critical bottlenecks and irrelevant failures. We first formalize this limitation by proving that general agents are not universal, rendering standard worst-case analysis uninformative. To overcome this, we introduce s... more

Artificial IntelligenceJune 24, 2026 3:44am

Comments (0)
Views (24)

OpenThoughts-Agent: Data Recipes for Agentic Models

0upvotes

By: Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt

Agentic language models dramatically expand the applications of AI yet little is publicly known about how to curate training data for broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully open data cur... more

Artificial IntelligenceJune 24, 2026 3:43am

Comments (0)
Views (146)

A specialized reasoning large language model for accelerating rare disease diagnosis: a randomized AI physician assistance trial

0upvotes

By: Haichao Chen, Songchi Zhou, Zhengyun Zhao, Shikai Hu, Xianghong Jin, Hongwei Ji, Li He, Shuli Li, Yiming Qin, Xin Tan, Runfeng Shi, Yih Chung Tham, Jiaye Zhu, Ye Li, Ye Jin, Longhao Cao, Dawei Li, Honghan Wu, Hongqiu Gu, Guanqiao Li, Tudor Groza, Chunying Li, Dian Zeng, Weihong Yu, Gareth Baynam, Saumya Shekhar Jamuar, Min Shen, Shuyang Zhang, Bin Sheng, Sheng Yu, Tien Yin Wong

Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to support rare disease diagnosis, current models are constrained by insufficient clinical deployability, limited clinically grounded evidence, and scarcity of training data. Here we present RaDaR (Rare Disease navigatoR), an open... more

Artificial IntelligenceJune 24, 2026 1:01am

Comments (0)
Views (380)

ReM-MoA: Reasoning Memory Sustains Mixture-of-Agents Scaling

0upvotes

By: Heng Ping, Arijit Bhattacharjee, Peiyu Zhang, Shixuan Li, Wei Yang, Ali Jannesari, Nesreen Ahmed, Paul Bogdan

Mixture-of-Agents (MoA) architectures improve inference-time scaling by organizing multiple LLM agents into layered reasoning pipelines. However, existing MoA variants fail to sustain gains as depth increases, exhibiting degradation, early plateauing, or saturation. We propose ReM-MoA, a memory-augmented MoA framework that sustains scaling through two mechanisms: (1) a Ranked Reasoning Memory that persistently stores and ranks reasoning trace... more

Artificial IntelligenceJune 24, 2026 1:00am

Comments (0)
Views (29)

VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct

0upvotes

By: Haoling Li, Kai Zheng, Jie Wu, Can Xu, Qingfeng Sun, Han Hu, Yujiu Yang

Scaling reinforcement learning for visual mathematical reasoning requires more than generating harder questions: as data volume grows, the reward labels themselves must remain reliable. Yet existing data pipelines scale supervision while trusting the labeller, and policy-side methods assume the underlying answers are already correct. We instead treat scaling as a verifiable data-construction problem and decouple two axes before any policy upd... more

Artificial IntelligenceJune 23, 2026 7:58am

Comments (0)
Views (22)

The Topology of Ill-Posed Questions: Persistent Homology for Detection and Steering in LLMs

0upvotes

By: Guangyu Jiang, Sizhe Tang, Mahdi Imani, Tian Lan

Ill-posed questions, including ambiguous, underspecified, or contradictory queries, may admit no valid answer or multiple plausible answers, posing a challenge for large language models (LLMs). Existing approaches largely analyze ill-posedness through model outputs and often focus on specific subclasses. We investigate whether diverse sources of ill-posedness can be represented within a unified topology of LLM internal states and whether this... more

Artificial IntelligenceJune 23, 2026 7:57am