2602.01308v2 Feb 01, 2026 cs.LG

신경망 최적화에서의 특이점의 저주를 극복하는 방법

Dispelling the Curse of Singularities in Neural Network Optimizations

Ruijun Huang

Citations: 8

h-index: 2

Fang Dong

Citations: 25

h-index: 3

Anrui Chen

Citations: 9

h-index: 2

Jixian Zhou

Citations: 28

h-index: 3

Mengyi Chen

Citations: 13

h-index: 3

Mingzhi Dong

Citations: 39

h-index: 5

Yujiang Wang

Citations: 31

h-index: 4

Hengjie Cao

Citations: 8

h-index: 2

Yifeng Yang

Citations: 14

h-index: 3

Dongsheng Li

Citations: 4

h-index: 1

Wenyi Fang

Citations: 7

h-index: 1

Yuanyi Lin

Citations: 0

h-index: 0

Fan Wu

Citations: 12

h-index: 1

Lili Shang

Citations: 15

h-index: 3

본 연구는 심층 신경망의 최적화 불안정성을 덜 탐구되었지만 통찰력 있는 관점에서 분석합니다. 바로 파라미터 공간에서 발생하는 특이점의 생성 및 증폭 현상입니다. 분석 결과, 파라미터 특이점은 경사 업데이트 과정에서 필연적으로 증가하며, 이는 표현(representation)과의 정렬을 더욱 강화시켜 표현 공간에서 특이점의 증가를 초래합니다. 우리는 경사 Frobenius 노름이 가중치 행렬의 최고 특이값에 의해 제한되며, 학습이 진행됨에 따라 가중치와 표현 특이점이 상호 강화되어 성장하는 '특이점의 저주' 현상이 이러한 제한을 완화시켜 급격한 손실 폭증의 위험을 증가시킨다는 것을 보여줍니다. 이러한 문제를 해결하기 위해, 우리는 가중치 행렬의 특이 스펙트럼을 완화하는 경량화되고 유연하며 효과적인 방법인 파라미터 특이점 완화(Parametric Singularity Smoothing, PSS)를 제안합니다. 다양한 데이터셋, 아키텍처, 최적화 알고리즘에 대한 광범위한 실험 결과, PSS는 불안정성을 완화하고, 실패 후에도 학습 능력을 회복시키며, 학습 효율성과 일반화 성능을 향상시키는 것을 입증했습니다.

Original Abstract

This work investigates the optimization instability of deep neural networks from a less-explored yet insightful perspective: the emergence and amplification of singularities in the parametric space. Our analysis reveals that parametric singularities inevitably grow with gradient updates and further intensify alignment with representations, leading to increased singularities in the representation space. We show that the gradient Frobenius norms are bounded by the top singular values of the weight matrices, and as training progresses, the mutually reinforcing growth of weight and representation singularities, termed the curse of singularities, relaxes these bounds, escalating the risk of sharp loss explosions. To counter this, we propose Parametric Singularity Smoothing (PSS), a lightweight, flexible, and effective method for smoothing the singular spectra of weight matrices. Extensive experiments across diverse datasets, architectures, and optimizers demonstrate that PSS mitigates instability, restores trainability even after failure, and improves both training efficiency and generalization.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!