2601.02751v1 Jan 06, 2026 cs.CL

파인튜닝된 대규모 언어 모델에 대한 윈도우 기반 멤버십 추론 공격

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen

Citations: 18

h-index: 3

Yuntao Du

Citations: 43

h-index: 4

Kaiyuan Zhang

Citations: 30

h-index: 3

Ashish Kundu

Citations: 91

h-index: 7

Charles Fleming

Citations: 42

h-index: 3

Bruno Ribeiro

Citations: 33

h-index: 3

Ninghui Li

Citations: 99

h-index: 5

대규모 언어 모델(LLM)에 대한 대부분의 멤버십 추론 공격(MIA)은 평균 손실과 같은 전역 신호를 사용하여 학습 데이터를 식별합니다. 그러나 이러한 접근 방식은 기억의 미묘하고 국소적인 신호를 희석시켜 공격의 효과를 감소시킵니다. 본 연구에서는 이러한 전역 평균화 패러다임을 비판적으로 검토하며, 멤버십 신호가 국소적인 맥락 내에서 더욱 두드러지게 나타난다고 주장합니다. 우리는 WBC(Window-Based Comparison, 윈도우 기반 비교)라는 새로운 방법을 제안합니다. WBC는 슬라이딩 윈도우 접근 방식을 사용하여 부호 기반 집계를 통해 이러한 통찰력을 활용합니다. 본 방법은 다양한 크기의 윈도우를 텍스트 시퀀스에 걸쳐 이동시키며, 각 윈도우는 대상 모델과 참조 모델 간의 손실 비교를 기반으로 멤버십에 대한 이진 투표를 수행합니다. 기하급수적으로 증가하는 윈도우 크기에 따른 투표를 결합하여, 토큰 수준의 특징부터 구문 수준의 구조까지의 기억 패턴을 포착합니다. 열한 가지 데이터 세트에 대한 광범위한 실험 결과, WBC는 기존의 기본 모델보다 훨씬 뛰어난 성능을 보이며, 더 높은 AUC 점수를 달성하고 낮은 오탐률에서 검출률을 2~3배 향상시켰습니다. 본 연구 결과는 국소적인 증거를 집계하는 것이 전역 평균화보다 근본적으로 효과적이며, 이는 파인튜닝된 LLM의 중요한 개인 정보 보호 취약점을 드러낸다는 것을 보여줍니다.

Original Abstract

Most membership inference attacks (MIAs) against Large Language Models (LLMs) rely on global signals, like average loss, to identify training data. This approach, however, dilutes the subtle, localized signals of memorization, reducing attack effectiveness. We challenge this global-averaging paradigm, positing that membership signals are more pronounced within localized contexts. We introduce WBC (Window-Based Comparison), which exploits this insight through a sliding window approach with sign-based aggregation. Our method slides windows of varying sizes across text sequences, with each window casting a binary vote on membership based on loss comparisons between target and reference models. By ensembling votes across geometrically spaced window sizes, we capture memorization patterns from token-level artifacts to phrase-level structures. Extensive experiments across eleven datasets demonstrate that WBC substantially outperforms established baselines, achieving higher AUC scores and 2-3 times improvements in detection rates at low false positive thresholds. Our findings reveal that aggregating localized evidence is fundamentally more effective than global averaging, exposing critical privacy vulnerabilities in fine-tuned LLMs.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!