2605.27932v1 May 27, 2026 cs.CV

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?

Fangzhou Wu
Fangzhou Wu
Citations: 337
h-index: 7
Binghan Lu
Binghan Lu
Citations: 1
h-index: 1
Bing Hu
Bing Hu
Citations: 17
h-index: 2
Yuan Tian
Yuan Tian
Citations: 31
h-index: 3
Xiaomin Li
Xiaomin Li
Citations: 129
h-index: 6
N. Gong
N. Gong
Citations: 15,384
h-index: 64

Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly understood. Existing systems already span multiple process designs, including direct response generation, text-only prior turn, visual-state manipulation, and explicit external image-tool invocation. In this paper, we ask which of these evaluated paradigms improves multimodal jailbreak robustness, and why. Across multiple vision-language models, explicit image-tool interaction yields the lowest attack success rates in our experiments, reducing jailbreak success by around 30% relative on average across the evaluated models. This finding is initially surprising: ASR remains low even when the returned image-tool output is manually overridden or itself unsafe-looking, but returns near direct-answering levels under text-only prior turn controls. These results indicate that the lower ASR is not explained by benign returned-image semantics or by the textual image-tool trace alone. To explain the pattern, we introduce an image-tool safety vector framework that models image-tool invocation as a residual shift in hidden representations toward a safety-relevant direction. Representation-level analyses and activation interventions support this account. Overall, our results suggest that explicit image-tool interaction is a promising design pattern for improving jailbreak robustness, while also motivating pipeline-specific safety evaluation.

0 Citations
0 Influential
30 Altmetric
150.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!