2604.05297v1 Apr 07, 2026 cs.AI

값 분해 기반 다중 에이전트 강화 학습에서 최적 이하 안정 지점 극복

Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning

Haodong Jing

Citations: 102

h-index: 6

Lesong Tao

Citations: 3

h-index: 1

Yifei Wang

Citations: 42

h-index: 4

Jingwen Fu

Xi'an Jiaotong Univerisity

Citations: 164

h-index: 8

Miao Kang

Citations: 142

h-index: 4

Shitao Chen

Citations: 726

h-index: 15

Nanning Zheng

Citations: 26

h-index: 3

다중 에이전트 강화 학습(MARL)에서 널리 사용되는 값 분해 방식은 상당한 이론적 및 알고리즘적 한계를 가지고 있으며, 최적 이하의 해로 수렴하는 경향은 여전히 제대로 이해되지 못하고 해결되지 않은 문제입니다. 기존의 분석은 주로 최적의 경우에 초점을 맞추고 있어 이러한 현상을 설명하지 못합니다. 이러한 간극을 메우기 위해, 우리는 값 분해 방식의 일반적인 경우에서의 잠재적인 수렴을 특징짓는 새로운 이론적 개념인 '안정 지점'을 소개합니다. 기존 방법에서 안정 지점의 분포를 분석한 결과, 비최적의 안정 지점이 성능 저하의 주요 원인임을 밝혀냈습니다. 그러나 알고리즘적으로 최적의 행동을 유일한 안정 지점으로 만드는 것은 거의 불가능합니다. 반면, 비최적의 행동을 불안정하게 만들어 반복적으로 제거하는 것이 전역 최적성을 달성하기 위한 보다 실용적인 접근 방식입니다. 이러한 아이디어를 바탕으로, 우리는 새로운 다중 라운드 값 분해(MRVF) 프레임워크를 제안합니다. 특히, MRVF는 이전에 선택된 행동에 대한 비음수 보상 증가량을 측정하여, 열등한 행동을 불안정하게 만들어 각 반복을 더 우수한 행동을 가진 안정 지점으로 이끌도록 설계되었습니다. 포식자-피식자 작업 및 스타크래프트 II 다중 에이전트 챌린지(SMAC)를 포함한 어려운 벤치마크에서 수행한 실험은 안정 지점에 대한 우리의 분석을 검증하고 MRVF가 최첨단 방법보다 우수함을 입증합니다.

Original Abstract

Value factorization, a popular paradigm in MARL, faces significant theoretical and algorithmic bottlenecks: its tendency to converge to suboptimal solutions remains poorly understood and unsolved. Theoretically, existing analyses fail to explain this due to their primary focus on the optimal case. To bridge this gap, we introduce a novel theoretical concept: the stable point, which characterizes the potential convergence of value factorization in general cases. Through an analysis of stable point distributions in existing methods, we reveal that non-optimal stable points are the primary cause of poor performance. However, algorithmically, making the optimal action the unique stable point is nearly infeasible. In contrast, iteratively filtering suboptimal actions by rendering them unstable emerges as a more practical approach for global optimality. Inspired by this, we propose a novel Multi-Round Value Factorization (MRVF) framework. Specifically, by measuring a non-negative payoff increment relative to the previously selected action, MRVF transforms inferior actions into unstable points, thereby driving each iteration toward a stable point with a superior action. Experiments on challenging benchmarks, including predator-prey tasks and StarCraft II Multi-Agent Challenge (SMAC), validate our analysis of stable points and demonstrate the superiority of MRVF over state-of-the-art methods.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!