2602.15198v1 Feb 16, 2026 cs.MA

콜로세움: 협력 다중 에이전트 시스템에서 공모 행위 감사

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

S. Zilberstein

Citations: 14,285

h-index: 50

Eugene Bagdasarian

Citations: 54

h-index: 4

Mason Nakamura

Citations: 21

h-index: 4

Abhinav Kumar

Citations: 87

h-index: 3

Saswat Das

Citations: 38

h-index: 4

Sahar Abdelnabi

Citations: 152

h-index: 5

Saaduddin Mahmud

University of Dhaka

Citations: 104

h-index: 5

Ferdinando Fioretto

Citations: 6

h-index: 1

LLM 에이전트들이 자유 형식의 언어를 통해 소통하는 다중 에이전트 시스템은 복잡한 협력 작업을 해결하기 위한 정교한 조정을 가능하게 합니다. 그러나 개별 에이전트들이 연합하여 부차적인 목표를 추구하고 공동 목표를 저하시키는 공모 행위를 할 때, 이는 고유한 안전 문제를 야기합니다. 본 논문에서는 다중 에이전트 환경에서 LLM 에이전트의 공모 행위를 감사하기 위한 프레임워크인 Colosseum을 제시합니다. 우리는 에이전트들이 분산 제약 최적화 문제(DCOP)를 통해 어떻게 협력하는지 분석하고, 협력 최적점에 대한 후회를 측정하여 공모 여부를 판단합니다. Colosseum은 다양한 목표, 설득 전술, 네트워크 토폴로지 하에서 각 LLM 에이전트의 공모 가능성을 테스트합니다. 우리의 감사를 통해, 대부분의 기본 모델이 인위적으로 형성된 비밀 통신 채널이 존재할 때 공모 경향을 보이는 것을 확인했습니다. 또한, 에이전트들이 텍스트로 공모 계획을 세우지만 실제로는 공모적인 행동을 거의 하지 않아 공동 작업에 큰 영향을 미치지 않는 '문서상 공모' 현상을 발견했습니다. Colosseum은 풍부하고 검증 가능한 환경에서 통신 및 행동을 측정함으로써 공모 현상을 연구하는 새로운 방법을 제공합니다.

Original Abstract

Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when individual agents form a coalition and \emph{collude} to pursue secondary goals and degrade the joint objective. In this paper, we present Colosseum, a framework for auditing LLM agents' collusive behavior in multi-agent settings. We ground how agents cooperate through a Distributed Constraint Optimization Problem (DCOP) and measure collusion via regret relative to the cooperative optimum. Colosseum tests each LLM for collusion under different objectives, persuasion tactics, and network topologies. Through our audit, we show that most out-of-the-box models exhibited a propensity to collude when a secret communication channel was artificially formed. Furthermore, we discover ``collusion on paper'' when agents plan to collude in text but would often pick non-collusive actions, thus providing little effect on the joint task. Colosseum provides a new way to study collusion by measuring communications and actions in rich yet verifiable environments.

6 Citations

0 Influential

25 Altmetric

131.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!