2602.04669v1 Feb 04, 2026 cs.LG

뮤온 심층 분석 및 확장 연구: 뮤온과 그 이상

Delving into Muon and Beyond: Deep Analysis and Extensions

Xianbiao Qi

Citations: 30

h-index: 4

Marco Chen

Citations: 56

h-index: 3

Jiaquan Ye

Citations: 12

h-index: 2

Yelin He

Product AI, Intellifusion

Citations: 108

h-index: 5

Rong Xiao

Citations: 919

h-index: 9

최근 뮤온 옵티마이저는 뛰어난 실증적 성능과 행렬 형태의 파라미터에 대한 직교화된 업데이트 방식을 사용하면서 상당한 관심을 받고 있지만, 그 근본적인 메커니즘과 Adam과 같은 적응형 옵티마이저와의 관계는 아직 충분히 이해되지 못하고 있습니다. 본 연구에서는 통일된 스펙트럴 관점을 통해 이러한 질문에 답하고자 합니다. 구체적으로, 우리는 뮤온을 U Σ^p V' 형태의 스펙트럴 변환 가족의 p = 0 지점으로 보고, p = 1/2, p = 1/4, 및 p = 1인 추가적인 변형을 고려합니다. 이러한 변환은 모멘텀 SGD에서 사용되는 첫 번째 모멘트 업데이트뿐만 아니라 Adam에서 사용되는 제곱근 평균(RMS) 정규화된 그래디언트 업데이트에도 적용됩니다. 효율적인 계산을 위해, 명시적인 특이값 분해를 피하는 결합된 뉴턴 반복법을 개발했습니다. 통제된 실험을 통해 RMS 정규화된 업데이트가 첫 번째 모멘트 업데이트보다 더 안정적인 최적화를 제공한다는 것을 확인했습니다. 또한, 스펙트럴 압축은 첫 번째 모멘트 업데이트에서 강력한 안정화 효과를 제공하지만, 뮤온 업데이트(p = 0)는 Adam보다 일관되게 우수한 성능을 나타내지 않습니다. 이러한 결과는 뮤온이 효과적인 형태의 스펙트럴 정규화로 가장 잘 이해될 수 있지만, 보편적으로 우수한 최적화 방법은 아니라는 것을 시사합니다. 저희의 소스 코드는 https://github.com/Ocram7/BeyondMuon 에서 공개될 예정입니다.

Original Abstract

The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechanisms and relationship to adaptive optimizers such as Adam remain insufficiently understood. In this work, we aim to address these questions through a unified spectral perspective. Specifically, we view Muon as the p = 0 endpoint of a family of spectral transformations of the form U \boldsymbolΣ^{p} V' , and consider additional variants with p = 1/2 , p = 1/4 , and p = 1 . These transformations are applied to both first-moment updates, as in momentum SGD, and to root-mean-square (RMS) normalized gradient updates as in Adam. To enable efficient computation, we develop a coupled Newton iteration that avoids explicit singular value decomposition. Across controlled experiments, we find that RMS-normalized updates yield more stable optimization than first-moment updates. Moreover, while spectral compression provides strong stabilization benefits under first-moment updates, the Muon update (p = 0) does not consistently outperform Adam. These results suggest that Muon is best understood as an effective form of spectral normalization, but not a universally superior optimization method. Our source code will be released at https://github.com/Ocram7/BeyondMuon.

5 Citations

1 Influential

27.9657359028 Altmetric

146.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!