Science Cast

How Uncertainty Estimation Scales with Sampling in Reasoning Models

librarianMarch 20, 2026 5:11am

Views (5)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

How Uncertainty Estimation Scales with Sampling in Reasoning Models

arXivPDFMarch 19, 2026 12:00am

Authors

Maksym Del, Markus Kängsepp, Marharyta Domnich, Ardi Tampuu, Lisa Yankovskaya, Meelis Kull, Mark Fishel

Abstract

Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks spanning mathematics, STEM, and humanities, we characterize how these signals scale. Both self-consistency and verbalized confidence scale in reasoning models, but self-consistency exhibits lower initial discrimination and lags behind verbalized confidence under moderate sampling. Most uncertainty gains, however, arise from signal combination: with just two samples, a hybrid estimator improves AUROC by up to $+12$ on average and already outperforms either signal alone even when scaled to much larger budgets, after which returns diminish. These effects are domain-dependent: in mathematics, the native domain of RLVR-style post-training, reasoning models achieve higher uncertainty quality and exhibit both stronger complementarity and faster scaling than in STEM or humanities.

TwitterandLinkedIn

0 comments

Add comment

How Uncertainty Estimation Scales with Sampling in Reasoning Models

How Uncertainty Estimation Scales with Sampling in Reasoning Models

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments