2602.11437v1 Feb 11, 2026 cs.AI

강건한 가치 분해를 이용한 분포적으로 강건한 협력 다중 에이전트 강화 학습

Distributionally Robust Cooperative Multi-Agent Reinforcement Learning via Robust Value Factorization

Chengrui Qu

Citations: 3

h-index: 1

Adam Wierman

Citations: 215

h-index: 9

Christopher Yeh

Citations: 10

h-index: 2

Kishan Panaganti

Citations: 396

h-index: 10

Eric Mazumdar

Citations: 93

h-index: 6

협력 다중 에이전트 강화 학습(MARL)은 일반적으로 중앙 집중식 훈련과 분산 실행 방식을 채택하며, 가치 분해 방법은 개별 에이전트가 글로벌 최적의 행동을 수행하도록 '개별-글로벌-최대(IGM)' 원칙을 적용하여 분산된 탐욕적 행동이 팀 전체의 최적의 합동 행동을 복구하도록 합니다. 그러나 시뮬레이션 환경과 실제 환경 간의 격차, 모델 불일치 및 시스템 노이즈로 인해 발생하는 환경적 불확실성으로 인해 이 방법의 신뢰성은 실제 환경에서 여전히 불안정합니다. 본 연구에서는 각 에이전트의 강건한 탐욕적 행동이 강건한 팀 전체 최적의 합동 행동과 일치하도록 요구하는 '분포적으로 강건한 IGM(DrIGM)'이라는 새로운 원칙을 제안하여 이러한 격차를 해소합니다. DrIGM은 새로운 정의의 강건한 개별 행동 가치에 대해 성립하며, 이는 분산된 탐욕적 실행과 호환되며 시스템 전체에 대한 검증 가능한 강건성 보장을 제공합니다. 이 기반을 바탕으로, 기존의 가치 분해 아키텍처(예: VDN/QMIX/QTRAN)의 DrIGM에 적합한 강건한 변형을 도출하여 (i) 강건한 Q-타겟을 사용하여 훈련하고, (ii) 확장성을 유지하며, (iii) 기존 코드베이스와 원활하게 통합할 수 있도록 설계되었습니다. 실험 결과, 고정밀 SustainGym 시뮬레이터와 StarCraft 게임 환경에서 제안하는 방법은 일반화 성능을 지속적으로 향상시키는 것으로 나타났습니다. 코드 및 데이터는 https://github.com/crqu/robust-coMARL 에서 확인할 수 있습니다.

Original Abstract

Cooperative multi-agent reinforcement learning (MARL) commonly adopts centralized training with decentralized execution, where value-factorization methods enforce the individual-global-maximum (IGM) principle so that decentralized greedy actions recover the team-optimal joint action. However, the reliability of this recipe in real-world settings remains unreliable due to environmental uncertainties arising from the sim-to-real gap, model mismatch, and system noise. We address this gap by introducing Distributionally robust IGM (DrIGM), a principle that requires each agent's robust greedy action to align with the robust team-optimal joint action. We show that DrIGM holds for a novel definition of robust individual action values, which is compatible with decentralized greedy execution and yields a provable robustness guarantee for the whole system. Building on this foundation, we derive DrIGM-compliant robust variants of existing value-factorization architectures (e.g., VDN/QMIX/QTRAN) that (i) train on robust Q-targets, (ii) preserve scalability, and (iii) integrate seamlessly with existing codebases without bespoke per-agent reward shaping. Empirically, on high-fidelity SustainGym simulators and a StarCraft game environment, our methods consistently improve out-of-distribution performance. Code and data are available at https://github.com/crqu/robust-coMARL.

0 Citations

0 Influential

30.493061443341 Altmetric

152.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!