2601.17360v1 Jan 24, 2026 cs.LG

강건한 개인 정보 보호: 인증된 강건성을 통한 추론 시 개인 정보 보호

Robust Privacy: Inference-Time Privacy through Certified Robustness

Deyue Zhang

Citations: 158

h-index: 4

Xiangzheng Zhang

Citations: 20

h-index: 3

Dongdong Yang

Citations: 71

h-index: 3

Jiankai Jin

Citations: 76

h-index: 3

Zhao Liu

Citations: 31

h-index: 3

Quanchen Zou

Citations: 24

h-index: 2

Wenzhuo Xu

Citations: 24

h-index: 3

머신 러닝 시스템은 개인화된 출력을 생성하여, 공격자가 추론 시에 민감한 입력 속성을 추론할 수 있도록 할 수 있습니다. 본 논문에서는 인증된 강건성에 영감을 받은 추론 시 개인 정보 보호 개념인 Robust Privacy (RP)를 소개합니다. 모델의 예측이 입력 $x$ 주변의 반지름-$R$ 이내 영역에서 증명 가능하게 불변이라면, $x$는 $R$-강건한 개인 정보 보호를 누리게 됩니다. 즉, 예측을 관찰하는 것만으로는 $x$를 $x$로부터 거리 $R$ 이내의 임의의 입력과 구별할 수 없습니다. 우리는 또한 입력 수준의 불변성을 속성 수준의 개인 정보 보호 효과로 변환하는 Attribute Privacy Enhancement (APE)를 개발했습니다. 민감한 속성에 따라 결정되는 제어된 추천 작업에서, RP는 긍정적인 추천과 호환되는 민감한 속성 값의 범위를 확장하고, 그에 따라 추론 범위를 확장함을 보여줍니다. 마지막으로, 우리는 실험적으로 RP가 모델 반전 공격 (MIAs)을 완화하는 것을 입증했습니다. 이는 미세한 입력-출력 의존성을 가림으로써 가능합니다. 작은 노이즈 수준($σ=0.1$)에서도, RP는 공격 성공률 (ASR)을 73%에서 4%로 줄이지만, 모델 성능이 다소 저하됩니다. 또한 RP는 모델 성능 저하 없이 MIAs를 부분적으로 완화할 수 있습니다 (예: ASR이 44%로 감소).

Original Abstract

Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($σ=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!