2603.08399v1 Mar 09, 2026 cs.LG

안정적인 오프라인 다중 에이전트 강화 학습을 위한 방법론

A Recipe for Stable Offline Multi-agent Reinforcement Learning

Dongsu Lee

Citations: 4

h-index: 1

Daehee Lee

Citations: 44

h-index: 3

Amy Zhang

Citations: 14

h-index: 2

단일 에이전트 오프라인 강화 학습(RL) 분야에서 괄목할 만한 성과가 있었음에도 불구하고, 다중 에이전트 강화 학습(MARL)은 이 패러다임을 수용하는 데 어려움을 겪고 있으며, 주로 온-폴리시 학습과 처음부터 시작하는 자기 대결 방식을 유지하고 있습니다. 이러한 격차의 한 가지 이유는 비선형 가치 분해의 불안정성 때문인데, 이전 연구에서는 복잡한 혼합 네트워크를 피하고 대신 단일 에이전트 설정에서 사용되는 가치 정규화를 활용한 선형 가치 분해(예: VDN)를 선호했습니다. 본 연구에서는 오프라인 MARL 환경에서 비선형 가치 분해의 불안정성의 원인을 분석합니다. 우리의 관찰 결과는 이러한 비선형 분해가 가치 스케일 증폭과 불안정한 최적화를 유발한다는 것을 확인시켜줍니다. 이를 완화하기 위해, 우리는 벨만 고정점을 변경하지 않고 액터-크리틱 학습을 안정화하는 간단한 기술인 스케일 불변 가치 정규화(SVN)를 제안합니다. 실험적으로, 우리는 오프라인 MARL의 주요 구성 요소(예: 가치 분해, 가치 학습 및 정책 추출) 간의 상호 작용을 조사하고, 그 잠재력을 최대한 발휘할 수 있는 실용적인 방법론을 제시합니다.

Original Abstract

Despite remarkable achievements in single-agent offline reinforcement learning (RL), multi-agent RL (MARL) has struggled to adopt this paradigm, largely persisting with on-policy training and self-play from scratch. One reason for this gap comes from the instability of non-linear value decomposition, leading prior works to avoid complex mixing networks in favor of linear value decomposition (e.g., VDN) with value regularization used in single-agent setups. In this work, we analyze the source of instability in non-linear value decomposition within the offline MARL setting. Our observations confirm that they induce value-scale amplification and unstable optimization. To alleviate this, we propose a simple technique, scale-invariant value normalization (SVN), that stabilizes actor-critic training without altering the Bellman fixed point. Empirically, we examine the interaction among key components of offline MARL (e.g., value decomposition, value learning, and policy extraction) and derive a practical recipe that unlocks its full potential.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!