2603.16470v1 Mar 17, 2026 cs.IT

다중 위성 시스템에서 다중 에이전트 강화 학습을 통한 지연된 채널 상태 정보(CSI) 문제 해결

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou

Citations: 37

h-index: 4

Yasaman Omid

Citations: 318

h-index: 10

S. Lambotharan

Citations: 5,974

h-index: 39

Mahsa Derakhshan

Citations: 759

h-index: 14

Lajos Hanzo

Citations: 57

h-index: 3

위성 통신 네트워크를 차세대(NG) 기술과 통합하는 것은 글로벌 연결성을 확보하는 유망한 방법입니다. 그러나 서비스 품질은 정확한 채널 상태 정보(CSI)의 가용성에 크게 의존합니다. 위성 통신에서 채널 추정은 지상 사용자(terrestrial users)와 위성 간의 높은 전파 지연으로 인해 위성 측에서 CSI 관측값이 오래된 상태가 되는 어려움이 있기 때문에 어렵습니다. 본 논문에서는 다수의 위성이 분산 기지국(BS) 역할을 수행하여 지상 사용자에게 다운링크 전송을 제공하는 시스템을 연구합니다. 본 논문에서는 사용자의 합산 데이터 전송률을 극대화하는 동시에 오래된 CSI 문제를 해결하기 위한 다중 에이전트 강화 학습(MARL) 알고리즘을 제안합니다. 본 연구에서는 대규모 연속적인 동작 공간 및 MARL 환경에서의 독립적이고 동일하지 않은 분포(non-IID) 문제를 해결하기 위한 새로운 이중 계층 최적화 절차인 '듀얼 스테이지 프록시멀 폴리시 옵티마이제이션(DS-PPO)'을 설계했습니다. 구체적으로, DS-PPO의 첫 번째 단계에서는 개별 위성에 대한 합산 데이터 전송률을 최대화하고, 두 번째 단계에서는 모든 위성이 협력하여 분산 다중 안테나 BS를 형성할 때의 합산 데이터 전송률을 최대화합니다. 수치 결과는 DS-PPO가 CSI 불완전성에 강건하며, DS-PPO 사용 시 합산 데이터 전송률이 향상됨을 보여줍니다. 또한, DS-PPO의 수렴 분석 및 계산 복잡성에 대한 내용도 제공합니다.

Original Abstract

The integration of satellite communication networks with next-generation (NG) technologies is a promising approach towards global connectivity. However, the quality of services is highly dependant on the availability of accurate channel state information (CSI). Channel estimation in satellite communications is challenging due to the high propagation delay between terrestrial users and satellites, which results in outdated CSI observations on the satellite side. In this paper, we study the downlink transmission of multiple satellites acting as distributed base stations (BS) to mobile terrestrial users. We propose a multi-agent reinforcement learning (MARL) algorithm which aims for maximising the sum-rate of the users, while coping with the outdated CSI. We design a novel bi-level optimisation, procedure themes as dual stage proximal policy optimisation (DS-PPO), for tackling the problem of large continuous action spaces as well as of independent and non-identically distributed (non-IID) environments in MARL. Specifically, the first stage of DS-PPO maximises the sum-rate for an individual satellite and the second stage maximises the sum-rate when all the satellites cooperate to form a distributed multi-antenna BS. Our numerical results demonstrate the robustness of DS-PPO to CSI imperfections as well as the sum-rate improvement attached by the use of DS-PPO. In addition, we provide the convergence analysis for the DS-PPO along with the computational complexity.

0 Citations

0 Influential

19.5 Altmetric

97.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!