2602.00329v3 Jan 30, 2026 cs.LG

Adam 옵티마이저를 위한 실시간 데이터 Shapley 값 계산

In-Run Data Shapley for Adam Optimizer

Lijie Hu

Citations: 811

h-index: 15

Meng Ding

Citations: 1

h-index: 1

Zeqing Zhang

Citations: 4

h-index: 1

Di Wang

Citations: 3

h-index: 1

현대 머신러닝에서 데이터 기여도 분석은 편향을 완화하고 계산 자원을 절약하는 데 필수적이며, Shapley 값은 이론적인 기준으로 널리 사용됩니다. 최근의 "실시간" 방법들은 모델 재학습에 드는 막대한 비용을 피하기 위해 기여도를 동적으로 추정하지만, 이들은 확률적 경사 하강법(SGD)의 선형 구조에 크게 의존하며, Adam과 같은 적응형 옵티마이저의 복잡한 동작 양상을 제대로 반영하지 못합니다. 본 연구에서는 데이터 기여도가 본질적으로 옵티마이저에 의존적임을 보여줍니다. SGD 기반의 근사값이 Adam 환경에서 실제 기여도와 크게 차이가 난다는 것을 확인했습니다 (Pearson 상관계수 $R ext{ } ext{≈} ext{ } 0.11$). 따라서, 이러한 근사값은 현대적인 학습 파이프라인에서 효과적이지 않습니다. 이러한 격차를 해소하기 위해, 본 연구에서는 Adam에 특화된 실시간 데이터 Shapley 값 계산 방법을 제안합니다. 고정 상태 가정 하에서 유틸리티를 재정의하여 가산성을 회복하고, 새로운 선형화된 가상 근사(Linearized Ghost Approximation)를 통해 확장 가능한 계산을 가능하게 하는 닫힌 형태의 근사식을 도출했습니다. 이 기술은 분산에 의존하는 스케일링 항을 선형화하여, 각 샘플에 대한 경사값을 명시적으로 계산하지 않고도 쌍별 경사 벡터의 내적을 계산할 수 있도록 합니다. 광범위한 실험 결과, 제안하는 방법은 실제 기여도에 대해 거의 완벽한 정확도($R > 0.99$)를 달성하면서도, 기존 학습 속도의 약 95%를 유지합니다. 또한, Adam에 특화된 기여도 분석 방법은 데이터 기여도 관련 후속 작업에서 SGD 기반의 기존 방법들보다 훨씬 우수한 성능을 보입니다.

Original Abstract

Reliable data attribution is essential for mitigating bias and reducing computational waste in modern machine learning, with the Shapley value serving as the theoretical gold standard. While recent "In-Run" methods bypass the prohibitive cost of retraining by estimating contributions dynamically, they heavily rely on the linear structure of Stochastic Gradient Descent (SGD) and fail to capture the complex dynamics of adaptive optimizers like Adam. In this work, we demonstrate that data attribution is inherently optimizer-dependent: we show that SGD-based proxies diverge significantly from true contributions under Adam (Pearson $R \approx 0.11$), rendering them ineffective for modern training pipelines. To bridge this gap, we propose Adam-Aware In-Run Data Shapley. We derive a closed-form approximation that restores additivity by redefining utility under a fixed-state assumption and enable scalable computation via a novel Linearized Ghost Approximation. This technique linearizes the variance-dependent scaling term, allowing us to compute pairwise gradient dot-products without materializing per-sample gradients. Extensive experiments show that our method achieves near-perfect fidelity to ground-truth marginal contributions ($R > 0.99$) while retaining $\sim$95\% of standard training throughput. Furthermore, our Adam-aware attribution significantly outperforms SGD-based baselines in data attribution downstream tasks.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!