2606.16995v1 Jun 15, 2026 cs.AI

When in Doubt, Plan It Out: Committed Small Language Model Deliberation for Reactive Reinforcement Learning

Nathan Gavenski
Nathan Gavenski
Citations: 649
h-index: 3
Odinaldo Rodrigues
Odinaldo Rodrigues
Citations: 652
h-index: 3
Juarez Monteiro
Juarez Monteiro
Citations: 165
h-index: 8
Adriano Veloso
Adriano Veloso
Citations: 5
h-index: 1
Francisco Galuppo
Francisco Galuppo
Citations: 22
h-index: 1

Reinforcement Learning (RL) policies often degrade in unfamiliar environments because they lack explicit deliberation. We propose Plan, Align, Commit, Think (PACT), a hybrid architecture that combines a fast, reactive RL policy with a slow, deliberative Small Language Model (SLM) planner. PACT invokes the SLM asynchronously to generate and validate candidate action plans. Once a plan is verified through simulation as safe, feasible, and complete, it is executed directly, bypassing the RL policy without retraining or modifying it. Evaluated on three FrozenLake configurations of increasing difficulty, PACT outperforms all baselines while relying on a 2B-parameter SLM backbone, suggesting that deliberative planning and reactive execution are more powerful in concert than either is alone in these settings.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!