2602.16485v1 Feb 18, 2026 cs.CL

사고의 팀: 조정된 도구 활용을 통한 에이전트 시스템의 효율적인 런타임 스케일링

Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling

Junyi Liu

Citations: 82

h-index: 2

Yiren Zhao

Citations: 3,104

h-index: 21

Jeffrey T. H. Wong

Citations: 32

h-index: 3

Zixi Zhang

Citations: 61

h-index: 1

기존의 멀티 에이전트 시스템(MAS)은 일반적으로 정적이고 균일한 모델 구성을 사용하며, 이는 서로 다른 방식으로 추가 학습된 모델의 고유한 장점을 활용하는 능력을 제한합니다. 이를 해결하기 위해, 우리는 Team-of-Thoughts를 소개합니다. Team-of-Thoughts는 이기종 에이전트의 상호 보완적인 기능을 오케스트레이터-도구 패러다임을 통해 활용하는 새로운 MAS 아키텍처입니다. 당사의 프레임워크는 성능을 최적화하기 위해 두 가지 핵심 메커니즘을 도입합니다. (1) 우수한 협응 능력을 가진 모델을 식별하는 오케스트레이터 보정 체계, 그리고 (2) 도구 에이전트가 자체적인 전문 지식을 프로파일링하여 추가 학습된 기술의 변동성을 고려하는 자체 평가 프로토콜입니다. 추론 과정에서 오케스트레이터는 이러한 숙련도 프로필을 기반으로 가장 적합한 도구 에이전트를 동적으로 활성화합니다. 5가지 추론 및 코드 생성 벤치마크에 대한 실험 결과, Team-of-Thoughts는 일관되게 우수한 작업 성능을 제공합니다. 특히, AIME24와 LiveCodeBench에서 각각 96.67%와 72.53%의 정확도를 달성하여, 80%와 65.93%의 정확도를 기록한 균일한 역할 기반의 기준 모델보다 훨씬 뛰어난 성능을 보였습니다.

Original Abstract

Existing Multi-Agent Systems (MAS) typically rely on static, homogeneous model configurations, limiting their ability to exploit the distinct strengths of differently post-trained models. To address this, we introduce Team-of-Thoughts, a novel MAS architecture that leverages the complementary capabilities of heterogeneous agents via an orchestrator-tool paradigm. Our framework introduces two key mechanisms to optimize performance: (1) an orchestrator calibration scheme that identifies models with superior coordination capabilities, and (2) a self-assessment protocol where tool agents profile their own domain expertise to account for variations in post-training skills. During inference, the orchestrator dynamically activates the most suitable tool agents based on these proficiency profiles. Experiments on five reasoning and code generation benchmarks show that Team-of-Thoughts delivers consistently superior task performance. Notably, on AIME24 and LiveCodeBench, our approach achieves accuracies of 96.67% and 72.53%, respectively, substantially outperforming homogeneous role-play baselines, which score 80% and 65.93%.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!