2604.09024v1 Apr 10, 2026 cs.CV

제 이미지는 건드리지 마세요: 시각적 프롬프트 주입을 통한 멀티모달 대규모 언어 모델의 이미지 분석 방지

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao

Citations: 125

h-index: 7

Hongbin Liu

Citations: 588

h-index: 11

Yuepeng Hu

Citations: 150

h-index: 7

N. Gong

Citations: 15,384

h-index: 64

멀티모달 대규모 언어 모델(MLLM)은 인터넷 규모의 이미지 데이터를 분석하는 강력한 도구로 등장했으며, 상당한 이점을 제공하지만 중요한 안전 및 사회적 우려를 야기합니다. 특히, 공개 가중치를 가진 MLLM은 개인 이미지에서 신원, 위치 또는 기타 개인 정보와 같은 민감한 정보를 대규모로 추출하는 데 악용될 수 있습니다. 본 연구에서는 ImageProtector라는 사용자 측면 방법을 제안합니다. ImageProtector는 이미지를 공유하기 전에 신중하게 설계된, 거의 눈에 띄지 않는 노이즈를 삽입하여 MLLM에 대한 시각적 프롬프트 주입 공격을 수행함으로써 이미지를 사전에 보호합니다. 그 결과, 공격자가 MLLM을 사용하여 보호된 이미지를 분석하면, MLLM은 지속적으로 "죄송합니다. 해당 요청을 처리할 수 없습니다."와 같은 거부 응답을 생성하도록 유도됩니다. 저희는 ImageProtector의 효과를 6개의 MLLM과 4개의 데이터 세트를 사용하여 실증적으로 입증했습니다. 또한, 가우시안 노이즈, DiffPure 및 적대적 훈련을 포함한 세 가지 잠재적인 대응책을 평가했으며, 이러한 방법들이 ImageProtector의 영향을 부분적으로 완화하지만, 동시에 모델의 정확도와/또는 효율성을 저하시킨다는 것을 확인했습니다. 본 연구는 공개 가중치를 가진 MLLM과 대규모 자동 이미지 분석이라는 실질적으로 중요한 환경에 초점을 맞추고 있으며, 노이즈 기반 개인 정보 보호의 잠재력과 한계를 강조합니다.

Original Abstract

Multi-modal large language models (MLLMs) have emerged as powerful tools for analyzing Internet-scale image data, offering significant benefits but also raising critical safety and societal concerns. In particular, open-weight MLLMs may be misused to extract sensitive information from personal images at scale, such as identities, locations, or other private details. In this work, we propose ImageProtector, a user-side method that proactively protects images before sharing by embedding a carefully crafted, nearly imperceptible perturbation that acts as a visual prompt injection attack on MLLMs. As a result, when an adversary analyzes a protected image with an MLLM, the MLLM is consistently induced to generate a refusal response such as "I'm sorry, I can't help with that request." We empirically demonstrate the effectiveness of ImageProtector across six MLLMs and four datasets. Additionally, we evaluate three potential countermeasures, Gaussian noise, DiffPure, and adversarial training, and show that while they partially mitigate the impact of ImageProtector, they simultaneously degrade model accuracy and/or efficiency. Our study focuses on the practically important setting of open-weight MLLMs and large-scale automated image analysis, and highlights both the promise and the limitations of perturbation-based privacy protection.

0 Citations

0 Influential

30 Altmetric

150.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!