2603.16673v1 Mar 17, 2026 cs.RO

로봇은 언제 판단해야 하는가? 강화 학습을 이용한 자원 기반 추론: 로봇의 의사 결정

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Jun Liu

Citations: 102

h-index: 7

Pu Zhao

Citations: 32

h-index: 2

Zhenglun Kong

Harvard University

Citations: 1,300

h-index: 18

Xuan Shen

Citations: 507

h-index: 14

Peiyan Dong

Citations: 912

h-index: 15

Lin Cui

Citations: 1

h-index: 1

Geng Yuan

Citations: 80

h-index: 7

Wei Niu

Citations: 119

h-index: 7

Wenbin Zhang

Citations: 146

h-index: 3

Yanzhi Wang

Citations: 183

h-index: 4

Dong Huang

Citations: 64

h-index: 5

Fan Yang

Citations: 11

h-index: 2

Hao Tang

Citations: 117

h-index: 7

Xue Lin

Citations: 205

h-index: 10

Gaowen Liu

Citations: 13

h-index: 2

구체화된 로봇 시스템은 환경과의 상호 작용 시 고수준 추론, 계획 및 의사 결정을 지원하기 위해 점점 더 많이 대규모 언어 모델(LLM) 기반 에이전트에 의존하고 있습니다. 그러나 LLM 추론을 사용하는 것은 상당한 계산 지연 및 리소스 오버헤드를 발생시켜 행동 실행을 방해하고 시스템 신뢰성을 저하시킬 수 있습니다. 과도한 추론은 행동을 지연시키고, 불충분한 추론은 종종 부정확한 의사 결정과 작업 실패로 이어집니다. 이는 구체화된 에이전트에 대한 근본적인 질문을 제기합니다. 에이전트는 언제 추론해야 하고, 언제 행동해야 하는가? 본 연구에서는 RARRL(Resource-Aware Reasoning via Reinforcement Learning)이라는 계층적 프레임워크를 제안합니다. RARRL은 저수준 제어 정책을 학습하는 대신, 에이전트의 의사 결정 계층에서 작동하는 고수준 오케스트레이션 정책을 학습합니다. 이 정책을 통해 에이전트는 현재 관찰, 실행 기록 및 남은 리소스에 따라 추론을 사용할지 여부, 어떤 추론 역할을 사용할지, 그리고 얼마나 많은 계산 예산을 할당할지를 적응적으로 결정할 수 있습니다. ALFRED 벤치마크에서 얻은 실제 지연 프로필을 사용한 광범위한 실험 결과, RARRL은 일관되게 작업 성공률을 향상시키고, 실행 지연 시간을 줄이며, 기존의 고정되거나 휴리스틱한 추론 전략에 비해 견고성을 향상시키는 것으로 나타났습니다. 이러한 결과는 신뢰할 수 있고 효율적인 구체화된 로봇 에이전트를 구축하기 위해서는 적응적인 추론 제어가 필수적임을 보여줍니다.

Original Abstract

Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions, while insufficient reasoning often leads to incorrect decisions and task failures. This raises a fundamental question for embodied agents: when should the agent reason, and when should it act? In this work, we propose RARRL (Resource-Aware Reasoning via Reinforcement Learning), a hierarchical framework for resource-aware orchestration of embodied agents. Rather than learning low-level control policies, RARRL learns a high-level orchestration policy that operates at the agent's decision-making layer. This policy enables the agent to adaptively determine whether to invoke reasoning, which reasoning role to employ, and how much computational budget to allocate based on current observations, execution history, and remaining resources. Extensive experiments, including evaluations with empirical latency profiles derived from the ALFRED benchmark, show that RARRL consistently improves task success rates while reducing execution latency and enhancing robustness compared with fixed or heuristic reasoning strategies. These results demonstrate that adaptive reasoning control is essential for building reliable and efficient embodied robotic agents.

0 Citations

0 Influential

9 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!