Science Cast

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

librarianJune 16, 2026 1:11pm

Views (2)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

arXivPDFJune 15, 2026 12:00am

Authors

Nathan Gavenski, Juarez Monteiro, Francisco Galuppo, Adriano Veloso, Odinaldo Rodrigues

Abstract

Reinforcement Learning (RL) policies often degrade in unfamiliar environments because they lack explicit deliberation. We propose Plan, Align, Commit, Think (PACT), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner. PACT invokes the SLM asynchronously to generate and validate candidate action plans. Once a plan is verified through simulation as safe, feasible, and complete, it is executed directly, bypassing the RL policy without retraining or modifying it. Evaluated on three FrozenLake configurations of increasing difficulty, PACT outperforms all baselines while relying on a 2B-parameter SLM backbone, suggesting that deliberative planning and reactive execution are more powerful in concert than either is alone in these settings.

TwitterandLinkedIn

0 comments

Add comment

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments