Artificial Intelligence

A Vocabulary for Multi-Agent Automated Research Systems

A Vocabulary for Multi-Agent Automated Researc...

Artificial Intelligence

Bardiya Akhbari

14 views

OpenForgeRL: Train Harness-native Agents in Any Environment

OpenForgeRL: Train Harness-native Agents in An...

Artificial Intelligence

librarian

31 views

Beyond Sycophancy: Structured Resistance and Compliance in LLM Moral Reasoning

Beyond Sycophancy: Structured Resistance and C...

Artificial Intelligence

librarian

26 views

Detecting LLM-Generated Tokens in Human--LLM Coauthored Text

Detecting LLM-Generated Tokens in Human--LLM C...

Artificial Intelligence

librarian

27 views

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

PATS: Policy-Aware Training Scaffolding for Ag...

Artificial Intelligence

Yipeng Shi

32 views

AREX: Towards a Recursively Self-Improving Agent for Deep Research

AREX: Towards a Recursively Self-Improving Age...

Artificial Intelligence

librarian

32 views

Agentic Context Management: Solving Agent Memory and Cost by Treating Them as Lifecycle and Architecture Problems

Agentic Context Management: Solving Agent Memo...

Artificial Intelligence

librarian

33 views

SoftReason: A Fully Differentiable Neuro-Soft-Symbolic Deductive Reasoning Architecture over High-Dimensional Perceptual Data

SoftReason: A Fully Differentiable Neuro-Soft-...

Artificial Intelligence

Wael AbdAlmageed

30 views

PRO-LONG: Programmatic Memory Enables Long-Horizon Reasoning

PRO-LONG: Programmatic Memory Enables Long-Hor...

Artificial Intelligence

Alexis Fox

31 views

PoTRE: Test-Time Reasoning inspired by Cognitive Heterogeneity

PoTRE: Test-Time Reasoning inspired by Cogniti...

Artificial Intelligence

librarian

28 views

Train the Model, Not the Reader: Decodability Supervision for Verifiable Activation Explanations

Train the Model, Not the Reader: Decodability ...

Artificial Intelligence

Hiskias Dingeto

26 views

ResearchArena: Evaluating Sabotage and Monitoring in Automated AI R&D

ResearchArena: Evaluating Sabotage and Monitor...

Artificial Intelligence

librarian

32 views

Agents in the Wild: Where Research Meets Deployment

Agents in the Wild: Where Research Meets Deplo...

Artificial Intelligence

Grace Hui Yang

28 views

CodeRescue: Budget-Calibrated Recovery Routing for Coding Agents

CodeRescue: Budget-Calibrated Recovery Routing...

Artificial Intelligence

librarian

25 views

WorldCupArena: Fine-Grained Evaluation of Language Models and Deep-Research Agents on Football Forecasting

WorldCupArena: Fine-Grained Evaluation of Lang...

Artificial Intelligence

librarian

30 views

Rethinking Heterogeneous LLM Merging: A Weighted Model Averaging Perspective

Rethinking Heterogeneous LLM Merging: A Weight...

Artificial Intelligence

librarian

28 views

Can We Break LLMs Out of Self-Loops? Fine-Grained Reasoning Control with Activation Steering

Can We Break LLMs Out of Self-Loops? Fine-Grai...

Artificial Intelligence

Sheldon Yu

25 views

Logical Judgments Under Pressure: Diagnosing Syllogistic Stability with Learned Soft Prefixes

Logical Judgments Under Pressure: Diagnosing S...

Artificial Intelligence

librarian

27 views

AutoSynthesis: An agentic system for automated meta-analysis

AutoSynthesis: An agentic system for automated...

Artificial Intelligence

librarian

48 views

When Words Are Safe But Actions Kill: Probing Physical Danger Beyond Text Safety in Hidden-State Risk Space

When Words Are Safe But Actions Kill: Probing ...

Artificial Intelligence

librarian

43 views

Concept-Guided Spatial Regularization for World Models in Atari Pong

Concept-Guided Spatial Regularization for Worl...

Artificial Intelligence

librarian

47 views

Long-Context Fine-Tuning with Limited VRAM

Long-Context Fine-Tuning with Limited VRAM

Artificial Intelligence

librarian

42 views

MedFailBench: A Clinician-Built Open-Source Benchmark for Medical AI Safety Boundary Inspection

MedFailBench: A Clinician-Built Open-Source Be...

Artificial Intelligence

Goktug Ozkan

45 views

Can We Trust Item Response Theory for AI Evaluation?

Can We Trust Item Response Theory for AI Evalu...

Artificial Intelligence

Han Jiang

33 views

Benchmarking Multimodal Large Language Models for Scientific Visualization Literacy

Benchmarking Multimodal Large Language Models ...

Artificial Intelligence

Patrick Do

36 views

SearchOS-V1: Towards Robust Open-Domain Information-Seeking Agent Collaboration

SearchOS-V1: Towards Robust Open-Domain Inform...

Artificial Intelligence

librarian

37 views

The Industrialization of Research ; On AI-Driven Science and Its Consequences

The Industrialization of Research ; On AI-Driv...

Artificial Intelligence

Emmanuel Jeannot

35 views

Pretraining Data Can Be Poisoned through Computational Propaganda

Pretraining Data Can Be Poisoned through Compu...

Artificial Intelligence

Victoria Graf

32 views

Experience Memory Graph: One-Shot Error Correction for Agents

Experience Memory Graph: One-Shot Error Correc...

Artificial Intelligence

Wenjun Wang

41 views

Reproducing human biases in route choice using large language models: Toward scalable behavioral modeling

Reproducing human biases in route choice using...

Artificial Intelligence

Shuxian Xu

48 views

Interaction Scaling: Grounding the Third Axis of Test-Time Compute

Interaction Scaling: Grounding the Third Axis ...

Artificial Intelligence

Bojie Li

48 views

Think Through a Bottleneck: Hourglass Reasoning for Rigorous Induction

Think Through a Bottleneck: Hourglass Reasonin...

Artificial Intelligence

librarian

47 views

Web analytics