2604.00717v1 Apr 01, 2026 cs.MA

GRASP: 능동적 공유 인식을 통한 그래디언트 재정렬 - 다중 에이전트 협업 최적화를 위한 프레임워크

GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization

Y. Ong

Citations: 10

h-index: 2

Sihan Zhou

Citations: 5

h-index: 2

Tiantian He

Citations: 380

h-index: 9

Yifan Lu

Citations: 469

h-index: 6

Yaqing Hou

Citations: 14

h-index: 2

동시 정책 업데이트로 인해 발생하는 비정상성은 지속적인 환경 변동을 야기합니다. 중앙 집중식 훈련과 분산 실행(CTDE)과 같은 기존 방법 및 순차적 업데이트 방식은 이러한 문제를 완화합니다. 그러나 다른 에이전트의 정책에 대한 인식은 여전히 환경 상호 작용 데이터를 샘플링하는 것에 의존하기 때문에, 에이전트는 기본적으로 수동적인 인식 상태에서 작동합니다. 이는 필연적으로 평형 상태의 진동을 유발하고 시스템의 수렴 속도를 크게 늦춥니다. 이러한 문제를 해결하기 위해, 우리는 능동적 공유 인식을 통한 그래디언트 재정렬(GRASP)이라는 새로운 프레임워크를 제안합니다. GRASP는 일반화된 벨만 평형을 정책 진화의 안정적인 목표로 정의합니다. GRASP의 핵심 메커니즘은 에이전트의 독립적인 그래디언트를 사용하여 정의된 합의 그래디언트를 도출하는 것으로, 이를 통해 에이전트는 정책 업데이트를 능동적으로 인식하고 팀 협업을 최적화할 수 있습니다. 이론적으로, 우리는 카쿠타니 고정점 정리를 활용하여 합의 방향 $u^*$가 이 평형의 존재와 달성 가능성을 보장함을 증명했습니다. StarCraft II Multi-Agent Challenge (SMAC) 및 Google Research Football (GRF)에 대한 광범위한 실험 결과는 본 프레임워크의 확장성과 유망한 성능을 입증합니다.

Original Abstract

Non-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in a passive perception state. This inevitably triggers equilibrium oscillations and significantly slows the convergence speed of the system. To address this issue, we propose Gradient Realignment via Active Shared Perception (GRASP), a novel framework that defines generalized Bellman equilibrium as a stable objective for policy evolution. The core mechanism of GRASP involves utilizing the independent gradients of agents to derive a defined consensus gradient, enabling agents to actively perceive policy updates and optimize team collaboration. Theoretically, we leverage the Kakutani Fixed-Point Theorem to prove that the consensus direction $u^*$ guarantees the existence and attainability of this equilibrium. Extensive experiments on StarCraft II Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate the scalability and promising performance of the framework.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!