2604.02226v1 Apr 02, 2026 cs.AI

언제 도움을 요청할 것인가: 불확실성 기반 언어 지원을 통한 강화 학습

When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning

Nathan Gavenski

Citations: 617

h-index: 3

Juarez Monteiro

Citations: 162

h-index: 8

Gianlucca L. Zuin

Citations: 109

h-index: 6

Adriano Veloso

Citations: 5

h-index: 1

강화 학습(RL) 에이전트는 종종 분포 외(OOD) 환경에서 어려움을 겪으며, 이는 높은 불확실성과 무작위적인 행동으로 이어집니다. 언어 모델(LM)은 귀중한 세계 지식을 포함하고 있지만, 더 큰 모델은 높은 계산 비용을 초래하여 실시간 사용을 방해하며, 자율적인 계획 능력에 한계가 있습니다. 본 연구에서는 작은 LM과 훈련된 RL 정책을 결합하여 재학습 없이 OOD 일반화 성능을 향상시키는 Adaptive Safety through Knowledge (ASK)를 제안합니다. ASK는 몬테 카를로 드롭아웃을 사용하여 불확실성을 평가하고, 설정된 임계값을 초과하는 경우에만 LM에 행동 제안을 요청합니다. 이러한 선택적인 사용은 기존 정책의 효율성을 유지하면서 언어 모델의 추론 능력을 불확실한 상황에서 활용합니다. FrozenLake 환경에서의 실험 결과, ASK는 기존 환경에서는 성능 향상을 보이지 않지만, 전이 학습 작업에서는 0.95의 높은 보상을 달성하며 강력한 탐색 능력을 보여줍니다. 이러한 결과는 효과적인 신경-기호 통합이 단순한 결합보다는 신중한 조율이 필요하며, 성공적인 OOD 일반화를 위해서는 충분한 모델 규모와 효과적인 하이브리드화 메커니즘이 필수적임을 시사합니다.

Original Abstract

Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller LMs with trained RL policies to enhance OOD generalization without retraining. ASK employs Monte Carlo Dropout to assess uncertainty and queries the LM for action suggestions only when uncertainty exceeds a set threshold. This selective use preserves the efficiency of existing policies while leveraging the language model's reasoning in uncertain situations. In experiments on the FrozenLake environment, ASK shows no improvement in-domain, but demonstrates robust navigation in transfer tasks, achieving a reward of 0.95. Our findings indicate that effective neuro-symbolic integration requires careful orchestration rather than simple combination, highlighting the need for sufficient model scale and effective hybridization mechanisms for successful OOD generalization.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!