2604.04090v1 Apr 05, 2026 cs.LG

확률적 양층 최적화의 안정성과 일반화 성능에 대한 세밀한 분석

Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization

Xuelin Zhang

Citations: 20

h-index: 2

Hong Chen

Citations: 34

h-index: 4

Bin Gu

Citations: 31

h-index: 4

Tieliang Gong

Citations: 185

h-index: 7

Feng Zheng

Citations: 306

h-index: 7

확률적 양층 최적화(SBO)는 최근 하이퍼파라미터 최적화, 메타 학습 및 강화 학습을 포함한 다양한 머신 러닝 패러다임에 통합되었습니다. 광범위한 응용 분야와 함께 SBO의 계산적 특성에 대한 많은 연구가 진행되었지만, 통계적 학습 이론의 관점에서 SBO 방법의 일반화 성능에 대한 이해는 아직 부족합니다. 본 논문에서는 1차 기울기 기반 양층 최적화 방법에 대한 체계적인 일반화 분석을 제공합니다. 먼저, 평균적인 인자 안정성과 SBO 방법의 일반화 간극 사이의 정량적 관계를 규명합니다. 그런 다음, 단일 시간 척도 확률적 경사 하강법(SGD)과 이중 시간 척도 SGD에 대한 평균적인 인자 안정성의 상한을 유도합니다. 여기서 비볼록-비볼록(NC-NC), 볼록-볼록(C-C), 그리고 강하게 볼록-강하게 볼록(SC-SC)의 세 가지 설정을 각각 고려합니다. 실험적 분석을 통해 이론적 결과를 검증합니다. 기존의 알고리즘 안정성 분석과는 달리, 본 연구 결과는 각 반복마다 내부 레벨 파라미터를 재초기화할 필요가 없으며, 더 일반적인 목적 함수에 적용 가능합니다.

Original Abstract

Stochastic bilevel optimization (SBO) has been integrated into many machine learning paradigms recently, including hyperparameter optimization, meta learning, and reinforcement learning. Along with the wide range of applications, there have been numerous studies on the computational behavior of SBO. However, the generalization guarantees of SBO methods are far less understood from the lens of statistical learning theory. In this paper, we provide a systematic generalization analysis of the first-order gradient-based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generalization gap of SBO methods. Then, we derive the upper bounds of on-average argument stability for single-timescale stochastic gradient descent (SGD) and two-timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex-convex (C-C), and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimental analysis validates our theoretical findings. Compared with the previous algorithmic stability analysis, our results do not require reinitializing the inner-level parameters at each iteration and are applicable to more general objective functions.

8 Citations

0 Influential

3.5 Altmetric

25.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!