2604.22452v1 Apr 24, 2026 cs.AI

슈퍼마인드 테스트: 프로빙 에이전트를 활용하여 에이전트 사회의 집단 지능을 능동적으로 평가

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

Yunze Xiao

Language Techonology Institute

Citations: 792

h-index: 9

Xirui Li

Citations: 319

h-index: 5

Tianyi Zhou

Citations: 946

h-index: 11

Ryan Wong

Citations: 0

h-index: 0

Dianqi Li

Citations: 81

h-index: 4

Timothy Baldwin

Citations: 52

h-index: 4

Ming Li

Citations: 1,402

h-index: 15

집단 지능은 개별 구성원이 개별적으로 달성할 수 있는 것보다 더 나은 결과를 그룹이 달성하는 능력을 의미합니다. 대규모 언어 모델 에이전트가 수백만 명으로 확장됨에 따라, 다음과 같은 중요한 질문이 제기됩니다. 규모가 커짐에 따라 집단 지능이 자연스럽게 발생하는가? 본 연구는 대규모 자율 에이전트 사회에서 이 질문에 대한 최초의 실증적 평가를 제시합니다. 2백만 개 이상의 에이전트를 호스팅하는 플랫폼인 MoltBook을 연구하여, 우리는 Superminds Test라는 계층적 프레임워크를 소개합니다. 이 프레임워크는 공동 추론, 정보 종합, 그리고 기본적인 상호작용의 세 가지 단계에서 제어된 프로빙 에이전트를 사용하여 사회 수준의 지능을 탐색합니다. 우리의 실험 결과, 집단 지능이 현저히 부족하다는 것을 보여줍니다. 이 사회는 복잡한 추론 작업에서 개별 최고 성능 모델보다 뛰어난 성과를 거두지 못하며, 분산된 정보를 거의 종합하지 못하고, 심지어 간단한 협력 작업조차 자주 실패합니다. 플랫폼 전체 분석 결과, 상호작용은 피상적이며, 대부분의 경우 스레드가 단 하나의 답변을 넘어서지 않고, 대부분의 응답이 일반적이거나 주제와 관련이 없다는 것을 보여줍니다. 이러한 결과는 집단 지능이 단순히 규모만으로 발생하는 것이 아니라는 것을 시사합니다. 오히려 현재 에이전트 사회의 가장 큰 제한점은 극히 희박하고 피상적인 상호작용이며, 이는 에이전트가 정보를 교환하고 서로의 결과를 바탕으로 발전하는 것을 방해합니다.

Original Abstract

Collective intelligence refers to the ability of a group to achieve outcomes beyond what any individual member can accomplish alone. As large language model agents scale to populations of millions, a key question arises: Does collective intelligence emerge spontaneously from scale? We present the first empirical evaluation of this question in a large-scale autonomous agent society. Studying MoltBook, a platform hosting over two million agents, we introduce Superminds Test, a hierarchical framework that probes society-level intelligence using controlled Probing Agents across three tiers: joint reasoning, information synthesis, and basic interaction. Our experiments reveal a stark absence of collective intelligence. The society fails to outperform individual frontier models on complex reasoning tasks, rarely synthesizes distributed information, and often fails even trivial coordination tasks. Platform-wide analysis further shows that interactions remain shallow, with threads rarely extending beyond a single reply and most responses being generic or off-topic. These results suggest that collective intelligence does not emerge from scale alone. Instead, the dominant limitation of current agent societies is extremely sparse and shallow interaction, which prevents agents from exchanging information and building on each other's outputs.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!