2604.11462v1 Apr 13, 2026 cs.AI

맥락 병목 현상 극복: 강화 학습을 활용한 LLM 에이전트를 위한 능동적 맥락 관리

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

Yang Li

Citations: 25

h-index: 3

Xiaozhe Li

Citations: 3

h-index: 1

Tianyi Lyu

Citations: 5

h-index: 1

Siyi Yang

Citations: 3

h-index: 1

Yizhao Yang

Citations: 16

h-index: 3

Ligao Zhang

Citations: 0

h-index: 0

Zhuoyi Huang

Citations: 50

h-index: 4

Qingwen Liu

Citations: 959

h-index: 16

Liang Shan

Citations: 63

h-index: 5

대규모 언어 모델(LLM)은 '맥락 병목 현상'과 '중간 정보 손실' 현상으로 인해 장기적인 작업 수행에 어려움을 겪습니다. 이러한 문제점을 해결하기 위해, 우리는 맥락 관리와 작업 실행을 분리하는 공생 프레임워크를 제안합니다. 제안하는 아키텍처는 경량화된 전문 정책 모델인 ContextCurator와 강력하지만 고정된 기반 모델인 TaskExecutor를 결합합니다. 강화 학습을 통해 훈련된 ContextCurator는 작업 메모리 내 정보 엔트로피를 능동적으로 감소시킵니다. ContextCurator는 환경 노이즈를 적극적으로 제거하는 동시에 추론의 핵심인 중요한 데이터 포인트들을 보존합니다. WebArena에서, 제안하는 프레임워크는 Gemini-3.0-flash의 성공률을 36.4%에서 41.2%로 향상시키면서 토큰 사용량을 8.8% (47.4K에서 43.3K로) 감소시켰습니다. DeepSearch에서는 57.1%의 성공률을 달성하여 53.9%보다 높은 성능을 보였으며, 토큰 사용량을 8배 줄였습니다. 주목할 만한 점은 7B 파라미터의 ContextCurator가 GPT-4o의 맥락 관리 성능과 동등한 수준을 보여주며, 자율적인 장기 작업 에이전트에 대한 확장 가능하고 계산 효율적인 패러다임을 제공합니다.

Original Abstract

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!