2604.08232v1 Apr 09, 2026 cs.AI

HiRO-Nav: 하이브리드 추론을 통한 효율적인 에이전트 기반 내비게이션

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

Yijun Yang

Citations: 51

h-index: 4

Zichuan Lin

Citations: 664

h-index: 12

Chunyan Miao

Citations: 191

h-index: 6

Hengwei Zhao

Citations: 8

h-index: 2

Deheng Ye

Citations: 30

h-index: 2

대규모 추론 모델(LRM)을 기반으로 구축된 에이전트 기반 내비게이션 시스템은 복잡하고 다중 모달 환경 정보를 처리하고, 단계별로 상황에 맞는 추론을 수행하여 장기적인 목표 달성을 위한 순차적 의사 결정을 향상시킬 수 있습니다. 그러나 중요한 질문이 남아 있습니다: extit{LRM의 추론 능력을 어떻게 효율적으로 활용하여 장기적인 내비게이션 작업을 수행할 수 있을까요?} 단순한 환경에서는 에이전트가 반사적으로 행동해야 하지만, 복잡한 환경에서는 행동하기 전에 신중한 추론을 수행해야 합니다. 이러한 목표를 달성하기 위해, 우리는 액션 엔트로피에 따라 매 단계마다 추론을 수행할지 여부를 적응적으로 결정할 수 있는 최초의 에이전트인 extbf{H}ybr extbf{i}d extbf{R}eas extbf{O}ning extbf{Nav}igation ( extbf{HiRO-Nav}) 에이전트를 제안합니다. 특히, 에이전트의 액션 엔트로피가 내비게이션 경로를 따라 어떻게 변화하는지를 분석한 결과, 전체 행동 중에서 높은 엔트로피를 보이는 행동은 극히 일부에 불과하며, 이러한 행동은 종종 에이전트를 새로운 환경이나 중요한 객체로 이끄는 것으로 나타났습니다. 또한, 액션 엔트로피와 작업 완료도(즉, Q-값) 간의 관계를 연구한 결과, 높은 엔트로피를 보이는 행동을 개선하는 것이 작업 성공에 더 긍정적인 영향을 미치는 것으로 확인되었습니다. 따라서, 우리는 하이브리드 지도 학습을 초기 단계로 사용하여 추론을 명시적으로 활성화하는 하이브리드 추론 전략을 기반으로 온라인 강화 학습 파이프라인을 제안합니다. 이를 통해 높은 엔트로피를 보이는 행동에 대해서만 추론을 수행하여 계산 부담을 크게 줄이면서 의사 결정의 품질을 향상시킵니다. extsc{CHORES}-$oldmathbb{S}$ ObjectNav 벤치마크에서 수행한 광범위한 실험 결과, HiRO-Nav는 기존의 과도한 추론 방식과 추론을 전혀 사용하지 않는 방식에 비해 성공률과 토큰 효율성 간의 균형을 더 잘 유지하는 것으로 나타났습니다.

Original Abstract

Embodied navigation agents built upon large reasoning models (LRMs) can handle complex, multimodal environmental input and perform grounded reasoning per step to improve sequential decision-making for long-horizon tasks. However, a critical question remains: \textit{how can the reasoning capabilities of LRMs be harnessed intelligently and efficiently for long-horizon navigation tasks?} In simple scenes, agents are expected to act reflexively, while in complex ones they should engage in deliberate reasoning before acting.To achieve this, we introduce \textbf{H}ybr\textbf{i}d \textbf{R}eas\textbf{O}ning \textbf{Nav}igation (\textbf{HiRO-Nav}) agent, the first kind of agent capable of adaptively determining whether to perform thinking at every step based on its own action entropy. Specifically, by examining how the agent's action entropy evolves over the navigation trajectories, we observed that only a small fraction of actions exhibit high entropy, and these actions often steer the agent toward novel scenes or critical objects. Furthermore, studying the relationship between action entropy and task completion (i.e., Q-value) reveals that improving high-entropy actions contributes more positively to task success.Hence, we propose a tailored training pipeline comprising hybrid supervised fine-tuning as a cold start, followed by online reinforcement learning with the proposed hybrid reasoning strategy to explicitly activate reasoning only for high-entropy actions, significantly reducing computational overhead while improving decision quality. Extensive experiments on the \textsc{CHORES}-$\mathbb{S}$ ObjectNav benchmark showcases that HiRO-Nav achieves a better trade-off between success rates and token efficiency than both dense-thinking and no-thinking baselines.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!