2605.07174v1 May 08, 2026 cs.AI

학습 가능한 관찰자를 대상으로 하는 반복적인 기만 경로 계획

Repeated Deceptive Path Planning against Learnable Observer

Shiyu Zhang

Citations: 4

h-index: 1

Shiyue Cao

Citations: 39

h-index: 2

Pei Xu

Citations: 17

h-index: 2

Likun Yang

Citations: 4

h-index: 1

Lei Cui

Citations: 4

h-index: 1

Xiaotang Chen

Citations: 4,210

h-index: 21

Kaiqi Huang

Citations: 1

h-index: 1

Shizhao Yu

Citations: 4

h-index: 1

Yongjian Ren

Citations: 82

h-index: 1

본 연구는 에이전트가 외부 관찰자로부터 자신의 실제 목적지를 숨기려는 기만 경로 계획(Deceptive Path Planning, DPP) 문제를 다룬다. 기존 연구에서는 정적이고 학습 능력이 없는 관찰자를 가정하지만, 실제 환경에서는 중요한 물품 운송이나 군사 작전과 같은 상황에서 관찰자가 과거의 경로 데이터를 통해 학습하고 적응할 수 있다. 이러한 간극을 해소하기 위해, 본 연구에서는 학습 가능한 관찰자를 명시적으로 모델링하는 새로운 방법인 반복적인 기만 경로 계획(Repeated Deceptive Path Planning, RDPP)을 제안한다. 기존의 DPP 방법은 관찰자의 예측이 변화함에 따라 적응할 수 없기 때문에 RDPP 환경에서는 효과가 떨어진다는 것을 보여준다. 관찰자의 이전 예측을 업데이트에 포함하면 어느 정도의 적응이 가능하지만, 이러한 점진적인 업데이트는 누적적인 지연을 초래하여 기만 효과를 저하시킨다. 이에 본 연구에서는 에피소드 수준의 적응(단기 정책 조정)과 메타 수준의 업데이트(에피소드 간 피드백 활용)를 결합한 이중 최적화 프레임워크인 기만 메타 계획(Deceptive Meta Planning, DeMP)을 제안한다. DeMP는 관찰자가 자신의 모델을 어떻게 업데이트하는지 파악하고 향후 에피소드에서 적응을 가속화함으로써, 적응 지연의 누적을 완화하여 학습하는 관찰자에게 지속적인 기만을 가능하게 한다. 다양한 환경에서의 실험 결과는 DeMP가 기존 방법보다 RDPP 환경에서 현저히 우수한 성능을 보이며, 동시에 경쟁력 있는 경로 비용을 유지함을 보여준다. 본 연구의 결과는 학습 능력을 가진 적대자와의 반복적인 상호작용을 모델링하는 것의 중요성을 강조하며, 다중 에이전트 시스템에서의 기만 및 프라이버시에 대한 새로운 통찰력을 제공한다.

Original Abstract

We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP methods fail under this setting, as they cannot adapt to evolving adversarial predictions. While incorporating observer previous predictions into updates enables some adaptation, such incremental updates cause accumulative lag that degrades deception. To this end, we propose Deceptive Meta Planning (DeMP), a two-level optimization framework that combines episode-level adaptation, which enables short-term policy adjustment to counter updated observer, and meta-level updates, which leverage cross-episode feedback to capture how observers update their models and accelerate adaptation in future episodes. In this way, DeMP mitigates the accumulation of adaptation lag, enabling sustained deception against a learning observer. Experiments across environments demonstrate that DeMP significantly outperforms existing approaches in RDPP while maintaining competitive path cost. Our results highlight the importance of modeling repeated interactions with learnable adversaries, providing new insights into deception and privacy in multi-agent systems.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!