2602.11528v1 Feb 12, 2026 cs.CR

나를 추적하지 마! LLM에서의 속성 추론 공격에 대한 선제적 방어

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Jian Liang

Citations: 2

h-index: 1

Ran He

Citations: 138

h-index: 5

D. Yan

Citations: 94

h-index: 4

Tieniu Tan

Citations: 195

h-index: 5

최근 연구에 따르면 대형 언어 모델(LLM)은 온라인에 공유된 사용자 생성 텍스트로부터 사용자의 개인적인 속성(예: 나이, 위치, 성별)을 추론할 수 있으며, 이는 빠르고 대규모의 개인정보 침해를 가능하게 합니다. 기존의 익명화 기반 방어 기법들은 조밀하지 못하여, 개인정보 유출 요소를 익명화하는 데 있어 단어 수준의 정밀도가 부족합니다. 더욱이 민감한 단서를 숨기기 위해 사용자 텍스트를 수정하더라도 모델의 추론 능력을 통해 여전히 속성 추론이 발생할 수 있으므로 근본적인 한계가 있습니다. 이러한 한계를 해결하기 위해 본 논문에서는 세밀한(fine-grained) 익명화(TRACE)와 추론 방지 최적화(RPS)를 결합한 통합 방어 프레임워크를 제안합니다. TRACE는 어텐션 메커니즘과 추론 사슬 생성을 활용하여 개인정보가 유출되는 텍스트 요소를 식별하고 익명화하는 반면, RPS는 모델의 거부 동작을 유도하는 경량화된 2단계 최적화 전략을 적용하여 속성 추론을 방지합니다. 다양한 LLM에 대한 평가 결과, TRACE-RPS는 오픈소스 모델에서 속성 추론 정확도를 약 50%에서 5% 미만으로 감소시키는 것으로 나타났습니다. 또한, 본 제안 기법은 강력한 교차 모델 일반화, 프롬프트 변형에 대한 강건성, 그리고 유용성과 개인정보 보호 간의 균형을 제공합니다. 관련 코드는 https://github.com/Jasper-Yan/TRACE-RPS 에서 확인할 수 있습니다.

Original Abstract

Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-scale privacy breaches. Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements. Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models' reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). TRACE leverages attention mechanisms and inference chain generation to identify and anonymize privacy-leaking textual elements, while RPS employs a lightweight two-stage optimization strategy to induce model rejection behaviors, thereby preventing attribute inference. Evaluations across diverse LLMs show that TRACE-RPS reduces attribute inference accuracy from around 50\% to below 5\% on open-source models. In addition, our approach offers strong cross-model generalization, prompt-variation robustness, and utility-privacy tradeoffs. Our code is available at https://github.com/Jasper-Yan/TRACE-RPS.

1 Citations

0 Influential

30.547189562171 Altmetric

153.7 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!