2605.02202v1 May 04, 2026 cs.AI

CBV: 디퓨전 모델을 이용한 비전-언어 모델에 대한 클린-라벨 백도어 공격

CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

Cencen Liu

Citations: 39

h-index: 3

Ji Guo

Citations: 100

h-index: 5

Wenbo Jiang

Citations: 5

h-index: 1

Xiao Qin

Citations: 14

h-index: 2

Jielei Wang

Citations: 53

h-index: 4

Jie Chen

Citations: 112

h-index: 5

비전-언어 모델(VLMs)은 이미지 캡셔닝 및 시각 질의 응답(VQA)과 같은 작업에서 놀라운 성공을 거두었습니다. 그러나 이러한 모델의 응용 분야가 점점 더 확대됨에 따라, 최근 연구에서 VLMs가 백도어 공격에 취약하다는 사실이 밝혀졌습니다. 기존의 VLMs에 대한 백도어 공격은 주로 시각적 트리거를 추가하고 텍스트 레이블을 수정하여 데이터 포이즈닝을 통해 이루어지며, 이러한 방식은 유도된 이미지-텍스트 불일치로 인해 공격된 샘플이 쉽게 탐지될 수 있습니다. 이러한 한계를 극복하기 위해, 우리는 디퓨전 모델을 활용하여 자연스러운 공격 샘플을 생성하는 클린-라벨 백도어 공격 프레임워크(CBV)를 제안합니다. 구체적으로, CBV는 디퓨전 모델의 역방향 생성 과정에서 스코어를 수정하여 트리거된 이미지 특징을 포함하는 공격 샘플을 생성하도록 유도합니다. 공격의 효과를 더욱 향상시키기 위해, 우리는 트리거된 이미지의 텍스트 정보를 다중 모달 가이드로 활용하여 생성 과정에 통합합니다. 또한, 공격의 은밀성을 높이기 위해, 의미적으로 중요한 영역에만 제한적인 수정을 수행하는 GradCAM 가이드 마스크(GM)를 도입합니다. 우리는 MSCOCO 및 VQA v2 데이터셋에서 4개의 대표적인 VLMs를 사용하여 CBV를 평가한 결과, 80% 이상의 공격 성공률(ASR)을 달성하면서도 정상적인 기능을 유지하는 것을 확인했습니다.

Original Abstract

Vision-Language Models (VLMs) have achieved remarkable success in tasks such as image captioning and visual question answering (VQA). However, as their applications become increasingly widespread, recent studies have revealed that VLMs are vulnerable to backdoor attacks. Existing backdoor attacks on VLMs primarily rely on data poisoning by adding visual triggers and modifying text labels, where the induced image-text mismatch makes poisoned samples easy to detect. To address this limitation, we propose the Clean-Label Backdoor Attack on VLMs via Diffusion Models (CBV), which leverages diffusion models to generate natural poisoned examples via score matching. Specifically, CBV modifies the score during the reverse generation process of the diffusion model to guide the generation of poisoned samples that contain triggered image features. To further enhance the effectiveness of the attack, we incorporate the textual information of the triggered images as multimodal guidance during generation. Moreover, to enhance stealthiness, we introduce a GradCAM-guided Mask (GM) that restricts modifications to only the most semantically important regions, rather than the entire image. We evaluate our method on MSCOCO and VQA v2 with four representative VLMs, achieving over 80% ASR while preserving normal functionality.

2 Citations

0 Influential

2.5 Altmetric

14.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!