2603.28900v1 Mar 30, 2026 cs.RO

GPS 성능 저하 및 위조 환경에서 소형 무인 항공기 분리 보장을 위한 견고한 다중 에이전트 강화 학습

Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing

U. Topcu

Citations: 14,490

h-index: 54

Alex Zongo

Citations: 1

h-index: 1

Filippos Fotiadis

Citations: 315

h-index: 8

Peng Wei

Citations: 52

h-index: 3

본 연구에서는 다중 에이전트 강화 학습(MARL)을 이용하여 GPS 성능 저하 및 위조 환경에서 소형 무인 항공 시스템(sUAS)의 분리 보장을 위한 견고성을 다룹니다. 협력 감시 시스템에서 각 항공기(또는 에이전트)는 GPS를 기반으로 한 위치 정보를 전송합니다. 이러한 위치 정보 전송이 왜곡되면 전체적으로 관측되는 항공 교통 상태가 신뢰성을 잃게 됩니다. 우리는 이러한 상태 관측 왜곡을 에이전트와 적대자 간의 제로섬 게임으로 정의합니다. 확률 R로 적대자는 관측된 상태를 변경하여 각 에이전트의 안전 성능을 최대한 저하시킵니다. 우리는 이러한 적대적 변경에 대한 닫힌 형태의 식을 유도하여 적대적 학습을 완전히 생략하고 상태 차원에 대한 선형 시간 평가를 가능하게 합니다. 이 식은 2차 정확도로 실제 최악의 적대적 변경을 근사한다는 것을 보여줍니다. 또한, 클린 상태와 왜곡된 상태 간의 안전 성능 격차를 제한하며, Kullback-Leibler 정규화를 사용하면 이 격차가 왜곡 확률에 대해 최대 선형적으로 감소한다는 것을 보여줍니다. 마지막으로, 유도된 적대적 정책을 MARL 정책 경사 알고리즘에 통합하여 에이전트에 대한 견고한 반격 정책을 얻습니다. 고밀도 sUAS 시뮬레이션에서 최대 35%의 왜곡 수준에서도 거의 0%의 충돌률을 관찰했으며, 이는 적대적 변경 없이 학습된 기준 정책보다 우수한 성능을 보였습니다.

Original Abstract

We address robust separation assurance for small Unmanned Aircraft Systems (sUAS) under GPS degradation and spoofing via Multi-Agent Reinforcement Learning (MARL). In cooperative surveillance, each aircraft (or agent) broadcasts its GPS-derived position; when such position broadcasts are corrupted, the entire observed air traffic state becomes unreliable. We cast this state observation corruption as a zero-sum game between the agents and an adversary: with probability R, the adversary perturbs the observed state to maximally degrade each agent's safety performance. We derive a closed-form expression for this adversarial perturbation, bypassing adversarial training entirely and enabling linear-time evaluation in the state dimension. We show that this expression approximates the true worst-case adversarial perturbation with second-order accuracy. We further bound the safety performance gap between clean and corrupted observations, showing that it degrades at most linearly with the corruption probability under Kullback-Leibler regularization. Finally, we integrate the closed-form adversarial policy into a MARL policy gradient algorithm to obtain a robust counter-policy for the agents. In a high-density sUAS simulation, we observe near-zero collision rates under corruption levels up to 35%, outperforming a baseline policy trained without adversarial perturbations.

0 Citations

0 Influential

27 Altmetric

135.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!