2603.23007v1 Mar 24, 2026 cs.CR

AgentRAE: 알림 기반 시각적 백도어를 이용한 스크린샷 기반 모바일 GUI 에이전트의 원격 액션 실행

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

Shuchao Pang

Citations: 134

h-index: 7

Tian Dong

Citations: 717

h-index: 10

Minhui Xue

Citations: 136

h-index: 5

Yutao Luo

Citations: 7

h-index: 1

Haotian Zhu

Citations: 16

h-index: 3

Zhigang Lu

Citations: 187

h-index: 8

Yongbin Zhou

Citations: 101

h-index: 6

모바일 그래픽 사용자 인터페이스(GUI) 에이전트의 급속한 확산은 애플리케이션 및 운영체제(OS)를 자율적으로 제어하는 이러한 에이전트들이 새로운 시스템 수준의 공격 경로를 노출시키고 있습니다. 기존의 웹 GUI 에이전트 및 일반적인 GenAI 모델에 대한 백도어 공격은 환경 주입 또는 오해를 불러일으키는 팝업을 사용하여 에이전트의 작동을 속이는 방식을 사용합니다. 그러나 이러한 기술은 제한적인 트리거 설계 공간, OS 백그라운드 간섭, 그리고 여러 트리거-액션 매핑 간의 충돌로 인해 스크린샷 기반 모바일 GUI 에이전트에는 적용하기 어렵습니다. 본 논문에서는 AgentRAE라는 새로운 백도어 공격 방식을 제안합니다. AgentRAE는 시각적으로 자연스러운 트리거(예: 알림 내의 정상적인 앱 아이콘)를 사용하여 모바일 GUI 에이전트에서 원격 액션 실행을 유도합니다. 자연스러운 트리거로 인한 과소적합 문제를 해결하고 정확한 다중 대상 액션 리디렉션을 달성하기 위해, 우리는 두 단계의 파이프라인을 설계했습니다. 이 파이프라인은 먼저 대조 학습을 통해 에이전트의 미묘한 아이콘 차이에 대한 감수성을 향상시키고, 그 다음 백도어 후속 학습을 통해 각 트리거를 특정 모바일 GUI 에이전트 액션과 연결합니다. 광범위한 실험 결과, 제안된 백도어는 공격 성공률이 90% 이상으로 높은 상태에서 원활한 성능을 유지하며, 10가지 모바일 운영에 대해 효과적임을 확인했습니다. 또한, 정상적으로 보이는 트리거는 시각적으로 탐지하기 어렵고, 8가지 최첨단 방어 기술을 회피합니다. 이러한 결과는 모바일 GUI 에이전트에서 간과되어 왔던 백도어 공격 경로를 보여주며, 알림 기반 행동 및 내부 에이전트 표현을 면밀히 검토하는 방어 기술의 필요성을 강조합니다.

Original Abstract

The rapid adoption of mobile graphical user interface (GUI) agents, which autonomously control applications and operating systems (OS), exposes new system-level attack surfaces. Existing backdoors against web GUI agents and general GenAI models rely on environmental injection or deceptive pop-ups to mislead the agent operation. However, these techniques do not work on screenshots-based mobile GUI agents due to the challenges of restricted trigger design spaces, OS background interference, and conflicts in multiple trigger-action mappings. We propose AgentRAE, a novel backdoor attack capable of inducing Remote Action Execution in mobile GUI agents using visually natural triggers (e.g., benign app icons in notifications). To address the underfitting caused by natural triggers and achieve accurate multi-target action redirection, we design a novel two-stage pipeline that first enhances the agent's sensitivity to subtle iconographic differences via contrastive learning, and then associates each trigger with a specific mobile GUI agent action through a backdoor post-training. Our extensive evaluation reveals that the proposed backdoor preserves clean performance with an attack success rate of over 90% across ten mobile operations. Furthermore, it is hard to visibly detect the benign-looking triggers and circumvents eight representative state-of-the-art defenses. These results expose an overlooked backdoor vector in mobile GUI agents, underscoring the need for defenses that scrutinize notification-conditioned behaviors and internal agent representations.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!