2602.04634v1 Feb 04, 2026 cs.AI

WideSeek-R1: 다중 에이전트 강화 학습을 통한 광범위한 정보 탐색을 위한 너비 확장 탐구

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Zelai Xu

Citations: 372

h-index: 7

Zhexuan Xu

Citations: 25

h-index: 2

Ruize Zhang

Tsinghua University

Citations: 87

h-index: 2

Chunyang Zhu

Citations: 20

h-index: 2

Shi Yu

Citations: 1

h-index: 1

Weilin Liu

Citations: 42

h-index: 3

Quanlu Zhang

Citations: 63

h-index: 4

Wenbo Ding

Citations: 37

h-index: 1

Chao Yu

Citations: 154

h-index: 6

Yu Wang

Citations: 25

h-index: 3

최근 대규모 언어 모델(LLM)의 발전은 주로 단일 에이전트가 멀티 턴 추론과 도구 사용을 통해 장기적인 문제를 해결하는 '깊이 확장(depth scaling)'에 집중되었습니다. 그러나 작업의 범위가 넓어짐에 따라, 핵심 병목 지점은 개별 능력에서 조직적 역량으로 이동하게 됩니다. 본 연구에서는 광범위한 정보 탐색을 다루기 위해 다중 에이전트 시스템을 활용한 보완적 차원인 '너비 확장(width scaling)'을 탐구합니다. 기존 다중 에이전트 시스템은 수작업으로 설계된 워크플로와 순차적 상호작용에 의존하여 작업을 효과적으로 병렬화하지 못하는 경우가 많았습니다. 이러한 문제를 해결하기 위해, 우리는 확장 가능한 조정(orchestration)과 병렬 실행을 결합하도록 다중 에이전트 강화 학습(MARL)으로 훈련된 리드 에이전트-서브 에이전트 프레임워크인 WideSeek-R1을 제안합니다. WideSeek-R1은 독립된 컨텍스트와 전문화된 도구를 가진 공유 LLM을 활용하여, 2만 개의 광범위한 정보 탐색 작업 데이터셋에서 리드 에이전트와 병렬 서브 에이전트를 공동 최적화합니다. 광범위한 실험을 통해 WideSeek-R1-4B는 WideSearch 벤치마크에서 항목 F1 점수 40.0%를 달성하여 단일 에이전트인 DeepSeek-R1-671B와 대등한 성능을 보임을 확인했습니다. 나아가 WideSeek-R1-4B는 병렬 서브 에이전트 수가 증가함에 따라 일관된 성능 향상을 보여주며 너비 확장의 효과를 입증했습니다.

Original Abstract

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!