2604.13472v1 Apr 15, 2026 cs.LG

MARL에서 SARL로: 잠재적 합의를 통한 순서 독립적인 다중 에이전트 트랜스포머

Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

Zijian Zhao

The Hong Kong University of Science and Technology

Citations: 173

h-index: 7

Sen Li

Citations: 5

h-index: 1

Jin Gao

Citations: 223

h-index: 5

협력적 다중 에이전트 강화 학습(MARL)은 중앙 집중식 제어 문제를 여러 상호 작용 에이전트로 분해하여 큰 규모의 합동 관측 및 행동 공간 문제를 해결하는 데 널리 사용됩니다. 그러나 이러한 분해는 종종 비정상성, 불안정한 학습, 약한 조정, 제한적인 이론적 보장과 같은 추가적인 문제를 야기합니다. 본 논문에서는 협력적 MARL을 계층적 단일 에이전트 강화 학습(SARL) 방식으로 연결하는 중앙 집중식 프레임워크인 Consensus Multi-Agent Transformer (CMAT)를 제안합니다. CMAT은 모든 에이전트를 하나의 통일된 개체로 취급하고, Transformer 인코더를 사용하여 큰 규모의 합동 관측 공간을 처리합니다. 광범위한 합동 행동 공간을 처리하기 위해, Transformer 디코더를 사용하여 상위 수준의 합의 벡터를 자기 회귀적으로 생성하는 계층적 의사 결정 메커니즘을 도입합니다. 이는 에이전트가 잠재 공간에서 전략에 대한 합의를 이루는 과정을 시뮬레이션합니다. 이 합의 벡터에 조건화되어 모든 에이전트는 동시에 행동을 생성하므로, 기존의 다중 에이전트 트랜스포머(MAT)에서 나타나는 행동 생성 순서에 대한 민감성을 피하고 순서에 독립적인 합동 의사 결정을 가능하게 합니다. 이러한 분해를 통해 합동 정책을 단일 에이전트 PPO를 사용하여 최적화하면서, 잠재적 합의를 통해 표현력 있는 조정을 유지할 수 있습니다. 제안된 방법을 평가하기 위해, StarCraft II, Multi-Agent MuJoCo, 그리고 Google Research Football의 벤치마크 작업에 대한 실험을 수행했습니다. 결과는 CMAT이 최근의 중앙 집중식 솔루션, 순차적 MARL 방법, 그리고 기존의 MARL 기준보다 우수한 성능을 달성했음을 보여줍니다. 본 논문의 코드는 다음 주소에서 사용할 수 있습니다: https://github.com/RS2002/CMAT

Original Abstract

Cooperative multi-agent reinforcement learning (MARL) is widely used to address large joint observation and action spaces by decomposing a centralized control problem into multiple interacting agents. However, such decomposition often introduces additional challenges, including non-stationarity, unstable training, weak coordination, and limited theoretical guarantees. In this paper, we propose the Consensus Multi-Agent Transformer (CMAT), a centralized framework that bridges cooperative MARL to a hierarchical single-agent reinforcement learning (SARL) formulation. CMAT treats all agents as a unified entity and employs a Transformer encoder to process the large joint observation space. To handle the extensive joint action space, we introduce a hierarchical decision-making mechanism in which a Transformer decoder autoregressively generates a high-level consensus vector, simulating the process by which agents reach agreement on their strategies in latent space. Conditioned on this consensus, all agents generate their actions simultaneously, enabling order-independent joint decision making and avoiding the sensitivity to action-generation order in conventional Multi-Agent Transformers (MAT). This factorization allows the joint policy to be optimized using single-agent PPO while preserving expressive coordination through the latent consensus. To evaluate the proposed method, we conduct experiments on benchmark tasks from StarCraft II, Multi-Agent MuJoCo, and Google Research Football. The results show that CMAT achieves superior performance over recent centralized solutions, sequential MARL methods, and conventional MARL baselines. The code for this paper is available at:https://github.com/RS2002/CMAT .

0 Citations

0 Influential

28.993061443341 Altmetric

145.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!