2602.04634v2 Feb 04, 2026 cs.AI

WideSeek-R1: 멀티 에이전트 강화 학습을 활용한 광범위 정보 검색을 위한 폭 확장 연구

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Zelai Xu

Citations: 516

h-index: 8

Zhexuan Xu

Citations: 47

h-index: 3

Ruize Zhang

Tsinghua University

Citations: 101

h-index: 3

Chunyang Zhu

Citations: 43

h-index: 3

Shi Yu

Citations: 5

h-index: 1

Weilin Liu

Citations: 67

h-index: 4

Quanlu Zhang

Citations: 121

h-index: 5

Wenbo Ding

Citations: 47

h-index: 3

Chao Yu

Citations: 314

h-index: 6

Yu Wang

Citations: 36

h-index: 3

최근 대규모 언어 모델(LLM)의 발전은 주로 깊이 확장에 초점을 맞춰왔으며, 단일 에이전트가 다단계 추론과 도구 사용을 통해 장기적인 문제를 해결하는 방식으로 발전해왔습니다. 그러나 작업이 더 광범위해짐에 따라, 주요 병목 현상은 개별 에이전트의 능력보다는 조직의 역량으로 이동합니다. 본 연구에서는 멀티 에이전트 시스템을 활용하여 광범위 정보 검색 문제를 해결하기 위한 폭 확장이라는 또 다른 차원을 탐구합니다. 기존의 멀티 에이전트 시스템은 종종 수동으로 설계된 워크플로우와 번갈아 가며 진행되는 상호 작용에 의존하며, 이는 작업의 효율적인 병렬 처리를 어렵게 만듭니다. 이러한 격차를 해소하기 위해, 본 연구에서는 멀티 에이전트 강화 학습(MARL)을 통해 확장 가능한 조정 및 병렬 실행을 시너지 효과를 내도록 설계된 리드 에이전트-서브 에이전트 프레임워크인 WideSeek-R1을 제안합니다. WideSeek-R1은 공유된 LLM을 사용하며, 각 에이전트는 격리된 컨텍스트와 특수화된 도구를 활용합니다. WideSeek-R1은 2만 개의 광범위 정보 검색 작업으로 구성된 데이터셋을 사용하여 리드 에이전트와 병렬 서브 에이전트를 공동으로 최적화합니다. 광범위한 실험 결과, WideSeek-R1-4B는 WideSearch 벤치마크에서 40.0%의 F1 점수를 달성했으며, 이는 단일 에이전트 DeepSeek-R1-671B의 성능과 유사합니다. 또한, WideSeek-R1-4B는 병렬 서브 에이전트의 수가 증가함에 따라 일관된 성능 향상을 보여주며, 폭 확장의 효과를 입증합니다.

Original Abstract

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

5 Citations

0 Influential

4 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!