2601.20125v3 Jan 27, 2026 cs.LG

파인튜닝된 확산 언어 모델에 대한 멤버십 추론 공격

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen

Citations: 18

h-index: 3

Yuntao Du

Citations: 43

h-index: 4

Kaiyuan Zhang

Citations: 30

h-index: 3

Ashish Kundu

Citations: 91

h-index: 7

Charles Fleming

Citations: 42

h-index: 3

Bruno Ribeiro

Citations: 33

h-index: 3

Edoardo Stoppa

Citations: 14

h-index: 3

Ninghui Li

Citations: 55

h-index: 5

확산 언어 모델(DLM)은 양방향 마스크 토큰 예측을 사용하는 방식으로, 순환 언어 모델의 유망한 대안으로 떠오르고 있습니다. 하지만 DLM이 멤버십 추론 공격(MIA)을 통해 개인 정보 유출에 취약하다는 점은 아직 제대로 연구되지 않았습니다. 본 논문에서는 DLM의 MIA 취약점에 대한 최초의 체계적인 연구를 제시합니다. 순환 모델과 달리 DLM은 단일한 예측 패턴을 갖는 것이 아니라, 여러 개의 마스크 구성 옵션을 가지므로 공격 기회가 기하급수적으로 증가합니다. 이러한 다양한 마스크를 탐색하는 능력은 공격 성공 가능성을 크게 향상시킵니다. 이를 활용하기 위해, 본 연구에서는 희소 신호 문제를 해결하는 강력한 집계 방식을 사용하는 SAMA(Subset-Aggregated Membership Attack)를 제안합니다. SAMA는 다양한 밀도의 마스크된 부분 집합을 샘플링하고, 꼬리 분포 잡음에 강건한 부호 기반 통계 방법을 적용합니다. 희소 마스크의 깨끗한 신호를 우선적으로 고려하는 역 가중치 집계를 통해, SAMA는 희소한 암기 패턴 탐지를 강력한 투표 메커니즘으로 변환합니다. 9개의 데이터 세트에 대한 실험 결과, SAMA는 최적의 기준 모델보다 상대적으로 30% 더 높은 AUC 값을 달성했으며, 낮은 오탐율에서도 최대 8배의 성능 향상을 보였습니다. 이러한 결과는 DLM에 존재하는 중요한 취약점을 드러내며, 이에 대한 맞춤형 개인 정보 보호 방안 개발의 필요성을 강조합니다.

Original Abstract

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models' single fixed prediction pattern, DLMs' multiple maskable configurations exponentially increase attack opportunities. This ability to probe many independent masks dramatically improves detection chances. To exploit this, we introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation. SAMA samples masked subsets across progressive densities and applies sign-based statistics that remain effective despite heavy-tailed noise. Through inverse-weighted aggregation prioritizing sparse masks' cleaner signals, SAMA transforms sparse memorization detection into a robust voting mechanism. Experiments on nine datasets show SAMA achieves 30% relative AUC improvement over the best baseline, with up to 8 times improvement at low false positive rates. These findings reveal significant, previously unknown vulnerabilities in DLMs, necessitating the development of tailored privacy defenses.

3 Citations

1 Influential

3.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!