2603.16576v1 Mar 17, 2026 cs.CV

REFORGE: 다중 모드 공격이 밝혀내는 이미지 생성 모델의 취약한 개념 제거 문제

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

Renyang Liu

Citations: 130

h-index: 6

Yonglong Zou

Citations: 2

h-index: 1

Haoran Li

Citations: 8

h-index: 1

Shenyang Wei

Citations: 0

h-index: 0

Fanxiao Li

Citations: 47

h-index: 4

Yunyun Dong

Citations: 56

h-index: 4

Linyu Tang

Citations: 13

h-index: 1

Wei Zhou

Citations: 60

h-index: 4

최근 이미지 생성 모델(IGM)의 발전은 고품질 콘텐츠 제작을 가능하게 하지만, 동시에 저작권 침해 콘텐츠 재생산 및 공격적인 콘텐츠 생성과 같은 위험을 증폭시킵니다. 이미지 생성 모델 개념 제거(IGMU)는 완전한 재학습 없이 유해한 개념을 제거하여 이러한 위험을 완화합니다. 하지만, 적대적 입력, 특히 블랙박스 환경에서의 이미지 측 공격에 대한 강건성은 아직 충분히 연구되지 않았습니다. 이러한 격차를 해소하기 위해, 우리는 적대적 이미지 프롬프트를 통해 IGMU의 강건성을 평가하는 블랙박스 레드 팀 프레임워크인 REFORGE를 제시합니다. REFORGE는 스케치 기반 이미지를 초기화하고, 크로스 어텐션 기반 마스킹 전략을 사용하여 개념과 관련된 영역에 노이즈를 할당하여 공격 효율성과 시각적 충실도 간의 균형을 맞춥니다. 대표적인 개념 제거 작업 및 방어 기법에 대한 광범위한 실험 결과, REFORGE는 공격 성공률을 크게 향상시키면서, 관련된 기준 모델보다 더 강력한 의미론적 일관성과 높은 효율성을 달성하는 것으로 나타났습니다. 이러한 결과는 현재 IGMU 방법의 지속적인 취약점을 드러내며, 다중 모드 적대적 공격에 대한 강건성을 고려한 개념 제거의 필요성을 강조합니다. 저희의 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/Imfatnoily/REFORGE.

Original Abstract

Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of copyrighted content and the generation of offensive content. Image Generation Model Unlearning (IGMU) mitigates these risks by removing harmful concepts without full retraining. Despite growing attention, the robustness under adversarial inputs, particularly image-side threats in black-box settings, remains underexplored. To bridge this gap, we present REFORGE, a black-box red-teaming framework that evaluates IGMU robustness via adversarial image prompts. REFORGE initializes stroke-based images and optimizes perturbations with a cross-attention-guided masking strategy that allocates noise to concept-relevant regions, balancing attack efficacy and visual fidelity. Extensive experiments across representative unlearning tasks and defenses demonstrate that REFORGE significantly improves attack success rate while achieving stronger semantic alignment and higher efficiency than involved baselines. These results expose persistent vulnerabilities in current IGMU methods and highlight the need for robustness-aware unlearning against multi-modal adversarial attacks. Our code is at: https://github.com/Imfatnoily/REFORGE.

0 Citations

0 Influential

23 Altmetric

115.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!