2604.01664v1 Apr 02, 2026 cs.AI

컨텍스트 예산: 장기 탐색 에이전트를 위한 예산 기반 컨텍스트 관리

ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents

Yonghui Wu

Citations: 6,983

h-index: 4

Tianze Xu

Citations: 96

h-index: 5

Yanzhao Zheng

Citations: 84

h-index: 4

Zhentao Zhang

Citations: 81

h-index: 5

Yuanqiang Yu

Citations: 52

h-index: 4

Jihuai Zhu

Citations: 27

h-index: 2

Chao Ma

Citations: 366

h-index: 6

Binbin Lin

Citations: 122

h-index: 5

Baohua Dong

Citations: 66

h-index: 4

Hangcheng Zhu

Citations: 44

h-index: 3

Ruohui Huang

Citations: 60

h-index: 3

Gang Yu

Citations: 13

h-index: 3

LLM 기반 에이전트는 장기적인 추론 능력에서 뛰어난 잠재력을 보여주지만, 배포 환경(예: 메모리, 지연 시간, 비용)으로 인해 컨텍스트 크기가 제한되어 컨텍스트 예산이 제한됩니다. 상호 작용 기록이 증가함에 따라, 과거 정보를 유지하는 것과 컨텍스트 제한 내에 머무르는 것 사이의 균형이 필요합니다. 이러한 문제를 해결하기 위해, 우리는 컨텍스트 예산 제약 조건 하에서 컨텍스트 관리를 순차적 의사 결정 문제로 정의하는 예산 기반 컨텍스트 관리(BACM)를 제안합니다. BACM은 에이전트가 새로운 정보를 통합하기 전에 사용 가능한 예산을 평가하고, 언제 그리고 얼마나 많은 상호 작용 기록을 압축할지 결정할 수 있도록 합니다. 또한, 우리는 다양한 컨텍스트 예산 하에서 압축 전략을 학습하는, 커리큘럼 기반 강화 학습 접근 방식인 BACM-RL을 개발했습니다. 복합적 다중 목표 질의응답 및 장기 웹 브라우징 벤치마크 실험 결과, BACM-RL은 모델 크기와 작업 복잡성에 관계없이 기존 방법보다 우수한 성능을 보였습니다. 특히, 복잡도가 높은 환경에서 강력한 기준 모델보다 1.6배 이상의 성능 향상을 보였으며, 예산이 줄어드는 상황에서도 우수한 성능을 유지했습니다. 반면, 대부분의 방법은 예산이 줄어들면 성능이 저하되는 경향을 보였습니다.

Original Abstract

LLM-based agents show strong potential for long-horizon reasoning, yet their context size is limited by deployment factors (e.g., memory, latency, and cost), yielding a constrained context budget. As interaction histories grow, this induces a trade-off between retaining past information and staying within the context limit. To address this challenge, we propose Budget-Aware Context Management (BACM), which formulates context management as a sequential decision problem with a context budget constraint. It enables agents to assess the available budget before incorporating new observations and decide when and how much of the interaction history to compress. We further develop BACM-RL, an end-to-end curriculum-based reinforcement learning approach that learns compression strategies under varying context budgets. Experiments on compositional multi-objective QA and long-horizon web browsing benchmarks show that BACM-RL consistently outperforms prior methods across model scales and task complexities, achieving over $1.6\times$ gains over strong baselines in high-complexity settings, while maintaining strong advantages as budgets shrink, where most methods exhibit a downward performance trend.

3 Citations

0 Influential

3 Altmetric

18.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!