2602.12662v1 Feb 13, 2026 cs.AI

빠르게 생각하기와 느리게 생각하기: LLM 에이전트를 위한 단계별 인지 깊이 적응

Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents

Ruihan Yang

Citations: 604

h-index: 8

F. Ye

Citations: 81

h-index: 5

Xiang We

Citations: 2

h-index: 1

K. Luo

Citations: 37

h-index: 2

Xinbo Xu

Citations: 941

h-index: 4

Bo Zhao

Citations: 6

h-index: 2

Ruotian Ma

Citations: 99

h-index: 5

Shanyi Wang

Citations: 66

h-index: 3

Zhaopeng Tu

Citations: 135

h-index: 6

Xiaolong Li

Citations: 132

h-index: 6

Deqing Yang

Citations: 9

h-index: 2

Linus

Citations: 617

h-index: 11

Ruo-Tong Zhao

Citations: 36

h-index: 4

거대언어모델(LLM)은 멀티턴 의사결정 작업을 위한 자율 에이전트로 점차 더 많이 활용되고 있다. 그러나 현재의 에이전트들은 일반적으로 고정된 인지 패턴에 의존한다. 즉, '생각하지 않는' 모델은 즉각적인 반응을 생성하는 반면, '생각하는' 모델은 일률적으로 깊은 추론을 수행한다. 이러한 경직성은 장기적인 작업에서 비효율적인데, 각 단계마다 어떤 단계는 전략적 계획이 필요하고 다른 단계는 단순한 반복 실행만 필요한 등 인지적 요구가 크게 다르기 때문이다. 본 논문에서는 에이전트가 각 단계에서 인지 깊이를 동적으로 조절하도록 훈련시키는 프레임워크인 CogRouter를 소개한다. ACT-R 이론에 기반하여, 본능적 반응부터 전략적 계획에 이르는 4단계의 계층적 인지 수준을 설계했다. 우리의 2단계 훈련 방식에는 안정적인 수준별 패턴을 주입하기 위한 인지 인식 지도 미세 조정(CoSFT)과, 신뢰도 인식 이점 재가중(confidence-aware advantage reweighting)을 통한 단계별 기여도 할당을 위한 인지 인식 정책 최적화(CoPO)가 포함된다. 핵심 통찰은 적절한 인지 깊이가 결과 행동의 신뢰도를 최대화해야 한다는 것이다. ALFWorld와 ScienceWorld에서의 실험을 통해 CogRouter가 뛰어난 효율성으로 최신 성능(SOTA)을 달성함을 입증했다. Qwen2.5-7B를 사용했을 때 82.3%의 성공률을 기록하여 GPT-4o(+40.3%), OpenAI-o3(+18.3%), GRPO(+14.0%)를 능가하면서도 토큰 사용량은 62% 절감했다.

Original Abstract

Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks. However, current agents typically rely on fixed cognitive patterns: non-thinking models generate immediate responses, while thinking models engage in deep reasoning uniformly. This rigidity is inefficient for long-horizon tasks, where cognitive demands vary significantly from step to step, with some requiring strategic planning and others only routine execution. In this paper, we introduce CogRouter, a framework that trains agents to dynamically adapt cognitive depth at each step. Grounded in ACT-R theory, we design four hierarchical cognitive levels ranging from instinctive responses to strategic planning. Our two-stage training approach includes Cognition-aware Supervised Fine-tuning (CoSFT) to instill stable level-specific patterns, and Cognition-aware Policy Optimization (CoPO) for step-level credit assignment via confidence-aware advantage reweighting. The key insight is that appropriate cognitive depth should maximize the confidence of the resulting action. Experiments on ALFWorld and ScienceWorld demonstrate that CogRouter achieves state-of-the-art performance with superior efficiency. With Qwen2.5-7B, it reaches an 82.3% success rate, outperforming GPT-4o (+40.3%), OpenAI-o3 (+18.3%), and GRPO (+14.0%), while using 62% fewer tokens.

2 Citations

0 Influential

5.5 Altmetric

29.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!