2603.10397v1 Mar 11, 2026 cs.LG

라벨 노이즈를 활용한 확률적 경사 하강법(SGD)을 사용한 2층 선형 네트워크의 학습 동역학 연구

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Junchi Yan

Citations: 52

h-index: 2

Zhanpeng Zhou

Citations: 53

h-index: 4

Andi Han

Citations: 76

h-index: 5

Tongcheng Zhang

Citations: 97

h-index: 3

Mingze Wang

Citations: 247

h-index: 11

Wei Huang

Citations: 8

h-index: 2

Taiji Suzuki

Citations: 89

h-index: 5

심층 학습의 성공 요인 중 하나는 경사 기반 학습 알고리즘에 내재된 노이즈에 의해 유발되는 암묵적 편향입니다. 노이즈가 있는 레이블로 학습하면 모델의 일반화 성능이 향상된다는 경험적 관찰에 따라, 본 연구에서는 확률적 경사 하강법(SGD)과 라벨 노이즈의 근본적인 작동 원리를 분석합니다. 2층의 과적합된 선형 네트워크에 초점을 맞춰, 라벨 노이즈를 사용한 SGD의 학습 동역학을 분석한 결과, 2단계의 학습 행동을 발견했습니다. 1단계에서는 모델 가중치의 크기가 점진적으로 감소하며, 모델은 '게으른' 상태에서 벗어나 '풍부한' 상태로 진입합니다. 2단계에서는 모델 가중치와 실제 정답 사이의 정렬이 증가하며, 모델은 최종적으로 수렴합니다. 본 연구는 라벨 노이즈가 '게으른' 상태에서 '풍부한' 상태로의 전환을 이끄는 중요한 역할을 한다는 것을 강조하며, 이를 통해 라벨 노이즈가 경험적으로 성공하는 이유를 부분적으로 설명합니다. 또한, 본 연구는 이러한 통찰력을 Sharpness-Aware Minimization (SAM)으로 확장하여, 라벨 노이즈 SGD를 지배하는 원칙이 더 광범위한 최적화 알고리즘에도 적용될 수 있음을 보여줍니다. 합성 데이터와 실제 데이터 환경 모두에서 수행된 광범위한 실험 결과는 본 연구의 이론적 주장을 강력하게 뒷받침합니다. 본 연구의 코드는 다음 주소에서 제공됩니다: https://github.com/a-usually/Label-Noise-SGD.

Original Abstract

One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In \emph{Phase I}, the magnitudes of model weights progressively diminish, and the model escapes the lazy regime; enters the rich regime. In \emph{Phase II}, the alignment between model weights and the ground-truth interpolator increases, and the model eventually converges. Our analysis highlights the critical role of label noise in driving the transition from the lazy to the rich regime and minimally explains its empirical success. Furthermore, we extend these insights to Sharpness-Aware Minimization (SAM), showing that the principles governing label noise SGD also apply to broader optimization algorithms. Extensive experiments, conducted under both synthetic and real-world setups, strongly support our theory. Our code is released at https://github.com/a-usually/Label-Noise-SGD.

0 Citations

0 Influential

34.45879734614 Altmetric

172.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!