2604.21794v1 Apr 23, 2026 cs.AI

소통 학습: 멀티 에이전트 언어 시스템의 엔드 투 엔드 최적화를 향하여

Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Haibo Jin

Citations: 314

h-index: 5

Ye Yu

Citations: 39

h-index: 4

Xiaopeng Yuan

Citations: 11

h-index: 3

Heming Liu

Citations: 4

h-index: 1

Peng Kuang

Citations: 9

h-index: 2

Haohan Wang

Citations: 36

h-index: 3

대규모 언어 모델을 기반으로 구축된 멀티 에이전트 시스템은 복잡한 추론 작업에서 뛰어난 성능을 보이지만, 대부분의 연구는 에이전트 역할과 조율에 초점을 맞추고 에이전트 간의 통신을 고정된 인터페이스로 취급합니다. 키-값 캐시와 같은 내부 표현을 통한 잠재적 통신은 텍스트 기반 프로토콜에 대한 유망한 대안을 제공하지만, 기존 방법은 통신을 멀티 에이전트 추론과 함께 최적화하지 않습니다. 따라서 우리는 잠재적 통신을 멀티 에이전트 시스템의 학습 가능한 구성 요소로 취급하는 훈련 프레임워크인 DiffMAS를 제안합니다. DiffMAS는 멀티 에이전트 잠재적 경로에 대한 파라미터 효율적인 지도 학습을 수행하여 에이전트가 상호 작용 전반에 걸쳐 정보가 어떻게 인코딩되고 해석되어야 하는지를 함께 학습하도록 합니다. 수학적 추론, 과학 질의 응답, 코드 생성 및 상식 벤치마크에 대한 실험 결과, DiffMAS는 단일 에이전트 추론, 텍스트 기반 멀티 에이전트 시스템 및 기존의 잠재적 통신 방법에 비해 추론 정확도와 디코딩 안정성을 꾸준히 향상시킵니다. 구체적으로 AIME24에서 26.7%, GPQA-Diamond에서 20.2%의 성능 향상을 보였으며, 다양한 추론 벤치마크에서 일관된 성능 향상을 달성했습니다.

Original Abstract

Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.

3 Citations

0 Influential

2.5 Altmetric

15.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!