2603.10048v1 Mar 09, 2026 cs.LG

샤프니스-어웨어 미니마이즈이션 재검토: 더욱 정확하고 효과적인 구현

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

Jianlong Chen

Citations: 18

h-index: 3

Zhiming Zhou

Shanghai University of Finance and Economics

Citations: 553

h-index: 8

샤프니스-어웨어 미니마이즈이션(SAM)은 파라미터 주변의 미리 정의된 영역 내에서 최대 훈련 손실을 최소화하여 일반화 성능을 향상시킵니다. 그러나, 실제 구현에서는 이 과정을 현재 파라미터에 대한 기울기를 적용하기 전에 기울기 상승(gradient ascent)을 수행하는 방식으로 근사합니다. 이러한 방식은 현재 파라미터에 대한 상승 지점의 (전체) 미분을 무시함으로써, 목표 함수를 근사적으로 최적화하는 것으로 정당화될 수 있습니다. 그러나, 왜 상승 지점에서의 기울기를 사용하여 현재 파라미터를 업데이트하는 것이 더 우수한 결과를 보이는지에 대한 직접적이고 직관적인 이해는 여전히 부족합니다. 본 연구는 새로운 해석을 제시하여 이러한 간극을 메웁니다. 우리는 단일 단계 상승 지점에서 계산된 기울기가, 로컬 영역 내 최대값으로 향하는 방향을 현재 파라미터로부터 추정할 때, 로컬 기울기보다 더 나은 근사값을 제공한다는 것을 보여줍니다. 이러한 개선된 근사는 로컬 영역 내 최대값으로부터 더욱 직접적으로 벗어나는 것을 가능하게 합니다. 그러나, 우리의 분석은 또한 두 가지 문제점을 더 보여줍니다. 첫째, 단일 단계 상승 지점에서의 기울기는 종종 부정확한 근사값을 제공합니다. 둘째, 상승 단계의 수가 증가함에 따라 근사 품질이 저하될 수 있습니다. 이러한 제한 사항을 해결하기 위해, 본 논문에서는 eXplicit Sharpness-Aware Minimization (XSAM)을 제안합니다. XSAM은 첫 번째 문제점을 해결하기 위해 훈련 과정에서 최대값의 방향을 명시적으로 추정하고, 두 번째 문제점을 해결하기 위해 다단계 상승 지점에서의 기울기 정보를 효과적으로 활용하는 탐색 공간을 설계합니다. XSAM은 단일 단계 및 다단계 설정 모두에 적용 가능한 통합된 형식을 가지며, 계산 오버헤드가 미미합니다. 광범위한 실험 결과는 XSAM이 기존 방법보다 일관되게 우수한 성능을 보임을 보여줍니다.

Original Abstract

Sharpness-Aware Minimization (SAM) enhances generalization by minimizing the maximum training loss within a predefined neighborhood around the parameters. However, its practical implementation approximates this as gradient ascent(s) followed by applying the gradient at the ascent point to update the current parameters. This practice can be justified as approximately optimizing the objective by neglecting the (full) derivative of the ascent point with respect to the current parameters. Nevertheless, a direct and intuitive understanding of why using the gradient at the ascent point to update the current parameters works superiorly is still lacking. Our work bridges this gap by proposing a novel and intuitive interpretation. We show that the gradient at the single-step ascent point, \uline{when applied to the current parameters}, provides a better approximation of the direction from the current parameters toward the maximum within the local neighborhood than the local gradient. This improved approximation thereby enables a more direct escape from the maximum within the local neighborhood. Nevertheless, our analysis further reveals two issues. First, the approximation by the gradient at the single-step ascent point is often inaccurate. Second, the approximation quality may degrade as the number of ascent steps increases. To address these limitations, we propose in this paper eXplicit Sharpness-Aware Minimization (XSAM). It tackles the first by explicitly estimating the direction of the maximum during training, while addressing the second by crafting a search space that effectively leverages the gradient information at the multi-step ascent point. XSAM features a unified formulation that applies to both single-step and multi-step settings and only incurs negligible computational overhead. Extensive experiments demonstrate the consistent superiority of XSAM against existing counterparts.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!