2602.01011v3 Feb 01, 2026 cs.MA

다중 에이전트 팀이 전문가의 능력을 저해하다

Multi-Agent Teams Hold Experts Back

James Zou

Citations: 325

h-index: 9

Aneesh Pappu

Citations: 2,763

h-index: 5

Batu El

Citations: 43

h-index: 4

Hancheng Cao

Citations: 896

h-index: 8

C. D. Nolfo

Citations: 3,695

h-index: 7

Yanchao Sun

Citations: 167

h-index: 6

Meng Cao

Citations: 79

h-index: 4

다중 에이전트 LLM 시스템은 자율적인 협력자로 점점 더 많이 활용되고 있으며, 이 시스템에서는 에이전트들이 미리 정해진 워크플로우를 따르기보다는 자유롭게 상호 작용합니다. 이러한 환경에서 효과적인 조정은 사전에 완전히 설계될 수 없으며, 상호 작용을 통해 자연스럽게 형성되어야 합니다. 그러나 기존 연구에서는 대부분 고정된 역할, 워크플로우 또는 집계 규칙을 통해 조정을 강제하여, 조정이 제약되지 않은 상태에서 자기 조직화된 팀의 성능이 얼마나 뛰어난지 여부에 대한 질문을 남겼습니다. 조직 심리학적 관점에서, 우리는 자기 조직화된 LLM 팀이 강력한 시너지를 달성하는지, 즉 팀의 성능이 최고의 개별 구성원의 성능과 일치하거나 능가하는지 조사했습니다. 인간의 영감을 받은 벤치마크와 최첨단 머신러닝 벤치마크를 통해, 인간 팀과는 달리 LLM 팀은 전문가 에이전트의 성능에 일관되게 미치지 못하며, 심지어 전문가를 명시적으로 지정하더라도 최대 37.6%의 성능 저하가 발생합니다. 이러한 실패의 원인을 분석한 결과, 전문가 활용 능력 부족, 즉 전문가를 식별하는 것 자체가 아니라 전문가의 지식을 효과적으로 활용하는 능력 부족이 주요 원인임을 확인했습니다. 대화 분석 결과, 팀 규모가 커질수록 전문가와 비전문가의 의견을 평균화하는 경향이 나타나는데, 이는 적절한 가중치를 부여하는 대신 통합적인 타협을 추구하는 현상이며, 이는 성능 저하와 관련이 있습니다. 흥미롭게도, 이러한 합의 추구 행동은 적대적인 에이전트에 대한 견고성을 향상시키는 것으로 나타나, 정렬성과 효과적인 전문 지식 활용 간의 균형이 필요함을 시사합니다. 우리의 연구 결과는 자기 조직화된 다중 에이전트 팀이 구성원의 집단적 전문성을 활용하는 능력에 중요한 격차가 존재함을 보여줍니다.

Original Abstract

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

4 Citations

0 Influential

4.5 Altmetric

26.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!