2602.06771v2 Feb 06, 2026 cs.LG

AEGIS: 적대적 목표 기반 유지-데이터 불필요 로버스트 개념 제거를 통한 확산 모델 보호

AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models

Fengpeng Li

Citations: 61

h-index: 4

Kemou Li

Citations: 65

h-index: 4

Qizhou Wang

Citations: 666

h-index: 13

Bo Han

Citations: 207

h-index: 6

Jiantao Zhou

Citations: 31

h-index: 2

개념 제거는 확산 모델(DM)이 유해한 콘텐츠를 생성하는 것을 방지하는 데 도움이 됩니다. 그러나 현재 방법은 로버스트성과 유지 능력 간의 균형 문제를 안고 있습니다. 로버스트성은 개념 제거 방법을 통해 미세 조정된 모델이 제거된 개념이 재활성화되지 않도록 하는 것을 의미하며, 유지 능력은 관련 없는 개념을 보존하여 모델의 전반적인 유용성을 유지하는 것을 의미합니다. 이 두 가지 모두 실제적인 개념 제거에 중요하지만, 동시에 이를 해결하는 것은 어렵습니다. 기존 연구는 일반적으로 한 가지 요소를 개선하는 대신 다른 요소를 희생하는 경향이 있습니다. 예를 들어, 단일 제거된 프롬프트를 고정된 안전한 대상으로 매핑하면 클래스 수준의 흔적이 남아 있어 프롬프트 공격에 취약할 수 있으며, 유지 능력에 중점을 둔 방식은 적응형 공격에 대한 성능이 저하됩니다. 본 논문에서는 로버스트성과 유지 능력을 동시에 향상시키는 데이터 불필요 방식의 프레임워크인 Adversarial Erasure with Gradient Informed Synergy (AEGIS)를 소개합니다.

Original Abstract

Concept erasure helps stop diffusion models (DMs) from generating harmful content; but current methods face robustness retention trade off. Robustness means the model fine-tuned by concept erasure methods resists reactivation of erased concepts, even under semantically related prompts. Retention means unrelated concepts are preserved so the model's overall utility stays intact. Both are critical for concept erasure in practice, yet addressing them simultaneously is challenging, as existing works typically improve one factor while sacrificing the other. Prior work typically strengthens one while degrading the other, e.g., mapping a single erased prompt to a fixed safe target leaves class level remnants exploitable by prompt attacks, whereas retention-oriented schemes underperform against adaptive adversaries. This paper introduces Adversarial Erasure with Gradient Informed Synergy (AEGIS), a retention-data-free framework that advances both robustness and retention.

4 Citations

0 Influential

6.5 Altmetric

36.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!