2603.01045v1 Mar 01, 2026 cs.MA

Silo-Bench: 다중 에이전트 LLM 시스템에서 분산 협조를 평가하기 위한 확장 가능한 환경

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Cao Liu

Citations: 7

h-index: 2

Ke Zeng

Citations: 56

h-index: 4

Feiran Liu

Citations: 13

h-index: 1

Yizhou Shan

Citations: 449

h-index: 10

Xinyi Huang

Citations: 18

h-index: 2

Yue Zhu

Citations: 68

h-index: 4

Xuxin Cheng

Citations: 73

h-index: 2

Wenyuan Jiang

Citations: 9

h-index: 2

Yuzhe Zhang

Citations: 4

h-index: 1

Xin Yang

Citations: 4

h-index: 1

Terry Jingchen Zhang

Citations: 40

h-index: 2

대규모 언어 모델은 정보 제한을 극복하기 위해 에이전트 간에 정보를 분산함으로써 다중 에이전트 시스템에 점점 더 많이 활용되고 있습니다. 그러나 에이전트가 단순히 정보를 교환하는 것이 아니라, 분산된 정보를 통해 신뢰성 있게 계산을 수행할 수 있는지 여부는 여전히 미해결 과제입니다. 본 논문에서는 3가지 통신 복잡성 수준에 걸쳐 30가지 알고리즘 작업을 평가하는 역할에 구애받지 않는 벤치마크인 Silo-Bench를 소개합니다. 54가지 구성에 대해 1,620번의 실험을 수행했습니다. 우리의 실험 결과, 근본적인 '통신-추론 격차(Communication-Reasoning Gap)'가 드러났습니다. 에이전트는 작업에 적합한 협조 구조를 자발적으로 형성하고 적극적으로 정보를 교환하지만, 체계적으로 분산된 상태를 정확한 답변으로 통합하는 데 실패합니다. 이러한 실패는 추론-통합 단계에 국한되며, 에이전트는 종종 충분한 정보를 획득하지만, 이를 통합하는 데 어려움을 겪습니다. 이러한 협조 오버헤드는 규모가 커짐에 따라 누적되어 병렬화의 이점을 완전히 상쇄합니다. 이러한 결과는 에이전트 수를 무분별하게 늘리는 것만으로는 컨텍스트 제한을 극복할 수 없으며, Silo-Bench는 진정으로 협력적인 다중 에이전트 시스템을 향한 발전을 추적하기 위한 기반을 제공합니다.

Original Abstract

Large language models are increasingly deployed in multi-agent systems to overcome context limitations by distributing information across agents. Yet whether agents can reliably compute with distributed information -- rather than merely exchange it -- remains an open question. We introduce Silo-Bench, a role-agnostic benchmark of 30 algorithmic tasks across three communication complexity levels, evaluating 54 configurations over 1,620 experiments. Our experiments expose a fundamental Communication-Reasoning Gap: agents spontaneously form task-appropriate coordination topologies and exchange information actively, yet systematically fail to synthesize distributed state into correct answers. The failure is localized to the reasoning-integration stage -- agents often acquire sufficient information but cannot integrate it. This coordination overhead compounds with scale, eventually eliminating parallelization gains entirely. These findings demonstrate that naively scaling agent count cannot circumvent context limitations, and Silo-Bench provides a foundation for tracking progress toward genuinely collaborative multi-agent systems.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!