2605.28116v1 May 27, 2026 cs.CR

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Yi Liu
Yi Liu
Citations: 713
h-index: 11
Gelei Deng
Gelei Deng
Citations: 3,705
h-index: 25
Yue-Ying Li
Yue-Ying Li
Citations: 86
h-index: 3
L. Zhang
L. Zhang
Citations: 101
h-index: 4
Ying Zhang
Ying Zhang
Citations: 262
h-index: 4
Ruoqi Guo
Ruoqi Guo
Citations: 0
h-index: 0
Yiheng Xiong
Yiheng Xiong
Citations: 151
h-index: 4
Lida Zhao
Lida Zhao
Citations: 948
h-index: 7
Ji Jie
Ji Jie
Citations: 0
h-index: 0
Yuxiao Lu
Yuxiao Lu
Citations: 22
h-index: 3

Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from user-generated content. We present MIRAGE (Mobile Injection of Realistic Adversarial GUI Examples), a pipeline that turns benign mobile screenshots into prompt-injection samples by placing attacker-controlled text into ordinary user-generated content regions, without modifying the agent, the application, or the operating system. MIRAGE operates in three stages: a Localizer identifies user-controllable regions on the screenshot, a Generator synthesises context-aware payloads and renders them in the application's native style, and a Curator moderates realism and balances the samples across applications, region types, and attack intents. A key challenge is that an injected screenshot must stay visually indistinguishable from genuine user content while still diverting the agent; we address this by separating the stages that control reach, realism, and distributional balance. On a 1,111-sample benchmark spanning ten applications and eleven attack intents, all five evaluated VLM agents are vulnerable, with attack success rates of 23%-30%, and MIRAGE scores higher on human realism ratings than the strongest prior attack (3.02 versus 2.52 out of 5). We further find that per-sample realism and attack success are uncorrelated, so visual-quality filtering alone cannot reliably defend against this threat.

0 Citations
0 Influential
12.5 Altmetric
62.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!