2602.19945v1 Feb 23, 2026 cs.LG

DP-FedAdamW: 차등 프라이버시 연합 대규모 모델을 위한 효율적인 옵티마이저

DP-FedAdamW: An Efficient Optimizer for Differentially Private Federated Large Models

Jin Liu

Citations: 41

h-index: 3

Ning Xi

Citations: 82

h-index: 3

Junkang Liu

Citations: 99

h-index: 6

Yinbin Miao

Citations: 157

h-index: 2

차등 프라이버시(DP) 환경에서 수렴 효율성과 견고성 간의 균형을 맞추는 것은 연합 학습(FL)의 핵심 과제이다. AdamW는 대규모 모델의 훈련과 미세 조정을 가속화하지만, 이를 차등 프라이버시 연합 학습(DPFL)에 직접 적용할 경우 세 가지 주요 문제가 발생함을 확인했다: (i) 데이터 이질성과 프라이버시 노이즈가 결합하여 2차 모멘트 추정기(second-moment estimator)의 분산을 증폭시키고, (ii) DP 섭동이 2차 모멘트 추정기에 편향을 유발하며, (iii) DP가 국지적 과적합(local overfitting)에 대한 AdamW의 민감도를 높여 클라이언트 드리프트(client drift)를 악화시킨다. 우리는 DPFL을 위한 최초의 AdamW 기반 옵티마이저인 DP-FedAdamW를 제안한다. 이 기법은 2차 모멘트 분산을 안정화하고, DP로 인해 유발된 편향을 제거하며, 클라이언트 드리프트를 억제하기 위해 국지적 업데이트를 전역 하강(global descent) 방향에 맞춰 DP 환경에서도 AdamW의 기능을 복원한다. 이론적으로 편향 없는 2차 모멘트 추정기를 구축하고 어떠한 데이터 이질성 가정 없이도 선형적으로 가속화된 수렴 속도를 증명했으며, 더 엄격한 $(\varepsilon,δ)$-DP 보장을 제공한다. 경험적 실험 결과는 언어 및 비전 트랜스포머(Transformers)와 ResNet-18 전반에 걸친 DP-FedAdamW의 효과성을 입증한다. Tiny-ImageNet(Swin-Base, $\varepsilon=1$) 데이터셋에서 DP-FedAdamW는 최신 최고 성능(SOTA) 기법을 5.83\% 능가한다. 코드는 부록에서 제공된다.

Original Abstract

Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!