2603.27490v1 Mar 29, 2026 cs.CL

AgentSwing: 장기 목표 웹 에이전트를 위한 적응형 병렬 컨텍스트 관리 라우팅

AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents

Jingren Zhou

Citations: 15,205

h-index: 27

Chen Qian

Citations: 307

h-index: 8

Pengjun Xie

Citations: 1,161

h-index: 16

Bryan Hooi

Citations: 1,219

h-index: 17

Xiaotian Zhang

Citations: 127

h-index: 5

Zuozhu Liu

Citations: 614

h-index: 13

Liangcai Su

Citations: 23

h-index: 3

Zhaopeng Feng

Citations: 250

h-index: 9

Zhen Zhang

Citations: 414

h-index: 8

Xinyu Wang

Citations: 653

h-index: 10

Xiaobin Wang

Citations: 143

h-index: 7

Runnan Fang

Citations: 770

h-index: 13

Qi Zhang

Citations: 15

h-index: 3

Baixuan Li

Citations: 603

h-index: 8

Shihao Cai

Citations: 179

h-index: 7

Rui Ye

Citations: 136

h-index: 4

Hui Chen

Citations: 18

h-index: 3

Jiang Yong

Citations: 60

h-index: 3

J. Zhou

Citations: 393

h-index: 10

대규모 언어 모델(LLM)이 장기적인 정보 검색을 위한 자율 에이전트로 발전함에 따라, 제한된 컨텍스트 용량을 관리하는 것이 중요한 병목 현상으로 부상했습니다. 기존의 컨텍스트 관리 방법은 일반적으로 전체 탐색 과정 동안 단일하고 고정된 전략을 사용합니다. 이러한 정적인 설계는 특정 상태에서는 잘 작동할 수 있지만, 장기적인 탐색 과정에서 축적된 컨텍스트의 유용성과 신뢰성이 변화함에 따라 적응할 수 없습니다. 이러한 과제를 공식화하기 위해, 우리는 장기적인 성공을 두 가지 상호 보완적인 측면, 즉 검색 효율성과 최종 정확성을 통해 설명하는 확률적 프레임워크를 소개합니다. 이러한 관점을 바탕으로, 우리는 상태 인지적이고 적응적인 병렬 컨텍스트 관리 라우팅 프레임워크인 AgentSwing을 제안합니다. AgentSwing은 각 트리거 지점에서 여러 개의 컨텍스트 관리된 브랜치를 병렬로 확장하고, 미리보기 라우팅을 사용하여 가장 유망한 다음 단계를 선택합니다. 다양한 벤치마크와 에이전트 백본에서의 실험 결과, AgentSwing은 강력한 정적 컨텍스트 관리 방법보다 일관되게 우수한 성능을 보이며, 종종 최대 $3 imes$ 더 적은 상호 작용 횟수로 동일하거나 더 높은 성능을 달성하는 동시에 장기 목표 웹 에이전트의 궁극적인 성능 한계를 향상시킵니다. 경험적 이점 외에도, 제안된 확률적 프레임워크는 장기 목표 에이전트를 위한 향후 컨텍스트 관리 전략을 분석하고 설계하는 데 유용한 기준을 제공합니다.

Original Abstract

As large language models (LLMs) evolve into autonomous agents for long-horizon information-seeking, managing finite context capacity has become a critical bottleneck. Existing context management methods typically commit to a single fixed strategy throughout the entire trajectory. Such static designs may work well in some states, but they cannot adapt as the usefulness and reliability of the accumulated context evolve during long-horizon search. To formalize this challenge, we introduce a probabilistic framework that characterizes long-horizon success through two complementary dimensions: search efficiency and terminal precision. Building on this perspective, we propose AgentSwing, a state-aware adaptive parallel context management routing framework. At each trigger point, AgentSwing expands multiple context-managed branches in parallel and uses lookahead routing to select the most promising continuation. Experiments across diverse benchmarks and agent backbones show that AgentSwing consistently outperforms strong static context management methods, often matching or exceeding their performance with up to $3\times$ fewer interaction turns while also improving the ultimate performance ceiling of long-horizon web agents. Beyond the empirical gains, the proposed probabilistic framework provides a principled lens for analyzing and designing future context management strategies for long-horizon agents.

4 Citations

0 Influential

13.5 Altmetric

71.5 Score

Original PDF

AI Analysis

Korean Summary

이 논문은 대형 언어 모델(LLM) 기반의 장기 탐색 웹 에이전트가 겪는 유한한 컨텍스트 용량 제한 문제를 해결하기 위해, 상태 인지형 적응형 병렬 컨텍스트 관리 프레임워크인 'AgentSwing'을 제안합니다. 탐색 과정에서 동일한 컨텍스트 관리 전략(예: 전체 초기화, 최근 기록 유지, 요약 등)을 고정적으로 사용하는 기존 방식과 달리, 컨텍스트 한계에 도달하면 여러 전략을 병렬로 적용하여 분기합니다. 이후 각 분기에서 짧은 턴(turn) 동안 미리 환경과 상호작용해 보는 미래 예측(Lookahead)을 수행한 뒤, 가장 성공 확률이 높은 경로를 동적으로 선택합니다. 이를 통해 탐색 효율성과 최종 정확도의 균형을 맞추어 다양한 벤치마크에서 최고 수준의 성능을 달성했습니다.

Key Innovations

장기 정보 검색 에이전트의 성공 여부를 '탐색 효율성(Search Efficiency)'과 '최종 정확도(Terminal Precision)'라는 두 가지 상호 보완적인 차원으로 분해하여 분석하는 최초의 확률적 프레임워크 도입
특정 임계값 도달 시 단일 컨텍스트 관리 기법에 의존하지 않고 이질적인 여러 전략(Discard-All, Keep-Last-N, Summary)을 동시에 실행하여 다양한 궤적 후보를 생성하는 병렬 컨텍스트 관리(Parallel Context Management)
단순히 관리된 컨텍스트 상태만 보고 판단하는 것이 아니라, 각 분기가 K-턴(turn) 동안 환경과 추가 상호작용을 한 후의 결과를 바탕으로 최적의 연속 경로를 결정하는 룩어헤드 라우팅(Lookahead Routing) 메커니즘

Learning & Inference Impact

AgentSwing은 모델의 파라미터를 업데이트할 필요가 없는 테스트 타임 확장(Test-time Scaling) 기법이므로 추가적인 학습(Training)이 요구되지 않습니다. 추론(Inference) 단계에서는 컨텍스트 트리거 시점에 여러 분기를 병렬로 생성하고 미래 예측(Lookahead) 과정을 수행하기 때문에 해당 시점에서의 API 호출 및 토큰 사용량(컴퓨팅 오버헤드)이 일시적으로 증가합니다. 그러나 상황에 맞지 않는 정적 컨텍스트 전략으로 인해 에이전트가 잘못된 정보에 갇히거나 무의미한 탐색 루프를 반복하는 현상을 방지합니다. 결과적으로 전체 궤적 관점에서 불필요한 상호작용 턴 수를 획기적으로 줄여주어(기존 대비 최대 3배 적은 턴 소모), 단기적인 연산 투자를 통해 장기적인 토큰 효율성과 추론 성공률 상한을 크게 높이는 긍정적인 영향을 미칩니다.

Technical Difficulty

중급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!