2605.14587v1 May 14, 2026 cs.LG

천사인가 악마인가: 플라스티시티 개입이 딥 강화 학습의 백도어 공격에 미치는 영향 연구

Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning

Chunyi Zhou

Citations: 184

h-index: 7

Yang Dai

Citations: 43

h-index: 4

Oubo Ma

Citations: 42

h-index: 3

Ruixiao Lin

Citations: 19

h-index: 3

Jiahao Chen

Citations: 51

h-index: 4

L. Du

Citations: 328

h-index: 9

Shouling Ji

Citations: 18

h-index: 3

많은 연구에서 딥 강화 학습(DRL)에 대한 백도어 공격의 심각한 위협이 강조되어 왔습니다. 그러나 기존 연구는 주로 기본적인 시나리오에 초점을 맞추고 있으며, 플라스티시티 개입은 현대 DPL 에이전트의 필수적인 구성 요소로 부상했습니다. 이러한 개입은 플라스티시티 손실을 완화하는 데 효과적이지만, 이러한 개입이 DRL의 백도어 취약성에 미치는 영향은 충분히 연구되지 않았으며, 이러한 체계적인 연구 부족은 실제 DRL 적용에 위험을 초래합니다. 이러한 간극을 메우기 위해, 우리는 대표적인 개입과 공격 시나리오를 통합한 14,664건의 사례를 실증적으로 연구했습니다. 그 결과, 하나의 개입(즉, SAM)만이 백도어 공격을 악화시키는 것으로 나타났으며, 다른 개입들은 이를 완화시키는 것으로 나타났습니다. 추가 분석 결과, 악화 현상은 백도어 그래디언트 증폭에 기인하며, 완화 현상은 활성화 경로 방해 및 표현 공간 압축에서 비롯되는 것으로 확인되었습니다. 이러한 결과를 바탕으로, 우리는 다음과 같은 두 가지 새로운 통찰력을 도출했습니다. (1) DRL에서 개입과 백도어 간의 메커니즘적 상호작용을 분해하는 강력한 백도어 주입을 위한 개념적 프레임워크인 SCC, 그리고 (2) 비정상적인 손실 지형의 선명도가 DRL 백도어 탐지를 위한 핵심 지표라는 것입니다.

Original Abstract

Extensive research has highlighted the severe threats posed by backdoor attacks to deep reinforcement learning (DRL). However, prior studies primarily focus on vanilla scenarios, while plasticity interventions have emerged as indispensable built-in components of modern DRL agents. Despite their effectiveness in mitigating plasticity loss, the impact of these interventions on DRL backdoor vulnerabilities remains underexplored, and this lack of systematic investigation poses risks in practical DRL deployments. To bridge this gap, we empirically study 14,664 cases integrating representative interventions and attack scenarios. We find that only one intervention (i.e., SAM) exacerbates backdoor threats, while other interventions mitigate them. Pathological analysis identifies that the exacerbation is attributed to backdoor gradient amplification, while the mitigation stems from activation pathway disruption and representation space compression. From these findings, we derive two novel insights: (1) a conceptual framework SCC for robust backdoor injection that deconstructs the mechanistic interplay between interventions and backdoors in DRL, and (2) abnormal loss landscape sharpness as a key indicator for DRL backdoor detection.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!