2605.02317v1 May 04, 2026 cs.AI

Anon: 실제 스펙트럼 전반에 걸쳐 최적화 알고리즘의 적응성을 추정하는 방법

Anon: Extrapolating Optimizer Adaptivity Across the Real Spectrum

Kaiyan Zhao

Citations: 56

h-index: 5

Yiming Wang

Citations: 92

h-index: 6

Jiajun Wu

Citations: 166

h-index: 6

Steve Drew

Citations: 10

h-index: 1

Xiaoguang Niu

Citations: 34

h-index: 4

Yiheng Zhang

Citations: 18

h-index: 2

Shaowu Wu

Citations: 1

h-index: 1

Leong Hou

Citations: 0

h-index: 0

Adam과 같은 적응형 최적화 알고리즘은 대규모 언어 모델 및 확산 모델과 같은 대규모 모델 학습에서 큰 성공을 거두었습니다. 그러나 이들은 종종 CNN과 같은 기존 아키텍처에서 사용되는 SGD와 같은 비적응형 방법보다 일반화 성능이 떨어지는 경향이 있습니다. 우리는 이러한 성능 격차의 주요 원인을 적응형 사전 조건 설정으로 인한 것임을 밝혀냈으며, 이는 최적화 알고리즘이 다양한 최적화 환경에 적응하는 능력을 제한합니다. 이러한 문제를 해결하기 위해, 우리는 새로운 최적화 기법인 Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique)을 제안합니다. Anon은 R에서 지속적으로 조정 가능한 적응성을 가지며, 이를 통해 SGD와 유사한 동작과 Adam과 유사한 동작 사이를 보간할 뿐만 아니라, 이를 초월하는 동작까지 구현할 수 있습니다. 전체 적응성 스펙트럼에서 수렴을 보장하기 위해, 우리는 AMSGrad의 경직된 최대값 추적 전략보다 더 유연한 새로운 메커니즘인 점진적 지연 업데이트 (Incremental Delay Update, IDU)를 도입하여, 기울기 노이즈에 대한 강건성을 향상시켰습니다. 우리는 볼록 및 비볼록 환경 모두에서 수렴에 대한 이론적 보장을 확립했습니다. 실험적으로, Anon은 대표적인 이미지 분류, 확산 및 언어 모델링 작업에서 최첨단 최적화 알고리즘보다 일관되게 뛰어난 성능을 보였습니다. 이러한 결과는 적응성이 유용한 조정 가능한 설계 원칙이 될 수 있음을 보여주며, Anon은 기존 및 최신 최적화 알고리즘 간의 격차를 해소하고, 각 알고리즘의 장점을 능가하는 최초의 통합되고 신뢰할 수 있는 프레임워크를 제공합니다.

Original Abstract

Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity in R, allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!