2606.13385v1 Jun 11, 2026 cs.CR

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

Kangjie Chen
Kangjie Chen
Nanyang Technological University
Citations: 901
h-index: 9
Bo Li
Bo Li
Citations: 15
h-index: 2
Zihao Wang
Zihao Wang
Citations: 75
h-index: 3
Yiming Li
Yiming Li
Citations: 192
h-index: 8
Yutong Wu
Yutong Wu
Citations: 82
h-index: 4
Zheyu Liu
Zheyu Liu
Citations: 884
h-index: 16
Fok Kar Wai
Fok Kar Wai
Citations: 193
h-index: 5
Pin-Yu Chen
Pin-Yu Chen
Citations: 85
h-index: 4
V. Thing
V. Thing
Citations: 3,621
h-index: 33
Dacheng Tao
Dacheng Tao
Citations: 11
h-index: 2
Tianwei Zhang
Tianwei Zhang
Citations: 341
h-index: 8

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing security benchmarks adopt an \textit{attack-centric} perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms. In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets. To capture these properties, we introduce \textbf{\sysname}, a \textit{stakeholder-centric} benchmark to systematically categorize and attribute harm in real-world web agent systems. It distinguishes between affected entities (e.g., user, seller, platform), decomposes the attacks into concrete objectives, and evaluates each case with complementary outcome- and process-level metrics. Our results reveal substantial and heterogeneous vulnerabilities: not a single attack objective is reliably resisted by current agents, and failures distribute across qualitatively distinct modes ranging from \emph{stealthy parasitism} (attack succeeds without disrupting the user's delegated task) to \emph{misaligned disruption} (task disrupted without attack success) and \emph{compounded failure} (both adversarial objective and task integrity simultaneously violated). These patterns are missed by conventional evaluation, highlighting the need for stakeholder-aware assessment of LLM-based agents in real-world deployments. Benchmark is available at https://github.com/StakeBench/SBC.

0 Citations
0 Influential
39.9657359028 Altmetric
199.8 Score
Original PDF
1

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!