2601.12030v1 Jan 17, 2026 cs.AI

ARC: 장기 정보 탐색 에이전트를 위한 능동적이고 성찰 주도적인 문맥 관리

ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents

Tong Yang

Citations: 13

h-index: 2

Yilun Yao

Citations: 3

h-index: 1

Elsie Dai

Citations: 3

h-index: 1

Zhewen Tan

Citations: 8

h-index: 1

Zhenyu Duan

Citations: 155

h-index: 1

Shanshan Huang

Citations: 58

h-index: 3

Shousheng Jia

Citations: 167

h-index: 2

Yanbing Jiang

Citations: 9

h-index: 2

거대 언어 모델은 심층 검색 및 장기 정보 탐색을 위한 연구 에이전트로 점점 더 많이 활용되고 있지만, 상호작용 이력이 길어짐에 따라 성능이 저하되는 경우가 많습니다. '문맥 부패(context rot)'라고 알려진 이러한 저하는 긴 추론 과정 동안 일관되고 과업과 관련된 내부 상태를 유지하는 데 실패했음을 나타냅니다. 기존 접근 방식은 주로 단순 축적이나 수동적 요약을 통해 문맥을 관리하며, 이를 정적인 산출물로 취급하여 초기 오류나 잘못된 강조점이 지속되도록 방치합니다. 이러한 문제 의식에서 출발하여, 우리는 실행 중 문맥을 동적인 내부 추론 상태로 취급하고, 문맥 관리를 능동적이며 성찰 주도적인 과정으로 체계화한 최초의 프레임워크인 ARC를 제안합니다. ARC는 성찰 주도적인 모니터링과 수정을 통해 이러한 관점을 구현하며, 불일치나 문맥의 질적 저하가 감지될 때 에이전트가 능동적으로 작업 문맥을 재구성할 수 있도록 합니다. 까다로운 장기 정보 탐색 벤치마크에 대한 실험 결과, ARC는 수동적 문맥 압축 방법보다 일관되게 우수한 성능을 보였으며, Qwen2.5-32B-Instruct 모델을 사용한 BrowseComp-ZH에서 최대 11%의 절대적인 정확도 향상을 달성했습니다.

Original Abstract

Large language models are increasingly deployed as research agents for deep search and long-horizon information seeking, yet their performance often degrades as interaction histories grow. This degradation, known as context rot, reflects a failure to maintain coherent and task-relevant internal states over extended reasoning horizons. Existing approaches primarily manage context through raw accumulation or passive summarization, treating it as a static artifact and allowing early errors or misplaced emphasis to persist. Motivated by this perspective, we propose ARC, which is the first framework to systematically formulate context management as an active, reflection-driven process that treats context as a dynamic internal reasoning state during execution. ARC operationalizes this view through reflection-driven monitoring and revision, allowing agents to actively reorganize their working context when misalignment or degradation is detected. Experiments on challenging long-horizon information-seeking benchmarks show that ARC consistently outperforms passive context compression methods, achieving up to an 11% absolute improvement in accuracy on BrowseComp-ZH with Qwen2.5-32B-Instruct.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!