2604.17009v1 Apr 18, 2026 cs.AI

작은 모델을 마스터 오케스트레이터로: 병렬 서브태스크 분해를 통한 통합 에이전트-도구 오케스트레이션 학습

Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition

Shengji Tang

Citations: 100

h-index: 5

Lei Bai

Citations: 48

h-index: 4

Peng Ye

Citations: 63

h-index: 5

Tao Chen

Citations: 20

h-index: 1

Wanli Ouyang

Citations: 3,571

h-index: 19

Wutao Xiong

Citations: 4

h-index: 1

Ting Liu

Citations: 9

h-index: 2

Yuzhuo Fu

Citations: 93

h-index: 5

Wenzhen Yuan

Citations: 33

h-index: 2

Fanchen Yu

Citations: 104

h-index: 6

다중 에이전트 시스템(MAS)은 다양한 에이전트와 외부 도구를 조정하여 복잡한 문제를 해결하는 데 명확한 이점을 제공합니다. 그러나 대부분의 기존 오케스트레이션 방법은 정적 워크플로우 또는 직렬 에이전트 스케줄링에 의존하며, 도구와 에이전트 간의 이기종 인터페이스 프로토콜로 인해 더욱 제한됩니다. 이는 시스템 복잡성을 높이고 확장성을 저하시킵니다. 이러한 문제를 완화하기 위해, 우리는 에이전트와 도구를 표준화된 학습 가능한 동작 공간으로 추상화하고 프로토콜 정규화 및 명시적인 상태 피드백을 제공하는 통합 병렬 오케스트레이션 패러다임인 Agent-as-Tool을 제안합니다. 이 패러다임을 기반으로, 우리는 경량 오케스트레이터인 ParaManager를 훈련시켰습니다. ParaManager는 계획 결정과 서브태스크 해결을 분리하여 상태 인지적 병렬 서브태스크 분해, 위임 및 비동기 실행을 가능하게 합니다. 훈련 과정에서, 우리는 회복 메커니즘이 포함된 지도 학습(SFT) 경로를 활용하여 견고성을 향상시키고, 강화 학습(RL)을 적용하여 작업 성공, 프로토콜 준수, 다양성 및 추론 효율성 간의 최적의 균형을 달성합니다. 실험 결과, ParaManager는 여러 벤치마크에서 뛰어난 성능을 보이며, 새로운 모델 풀 환경에서도 강력한 일반화 능력을 보여줍니다.

Original Abstract

Multi-agent systems (MAS) demonstrate clear advantages in tackling complex problems by coordinating diverse agents and external tools. However, most existing orchestration methods rely on static workflows or serial agent scheduling, and are further constrained by heterogeneous interface protocols between tools and agents. This leads to high system complexity and poor extensibility. To mitigate these issues, we propose Agent-as-Tool, a unified parallel orchestration paradigm that abstracts both agents and tools into a standardized, learnable action space with protocol normalization and explicit state feedback. Building on this paradigm, we train a lightweight orchestrator, ParaManager, which decouples planning decisions from subtask solving, enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. For training, we adopt a two-stage ParaManager training pipeline. It improves robustness by incorporating supervised fine-tuning (SFT) trajectories equipped with recovery mechanisms, and further applies reinforcement learning (RL) to achieve an optimal balance among task success, protocol compliance, diversity, and reasoning efficiency. Experiments show that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization under unseen model pools.

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!