2601.01452v4 Jan 04, 2026 cs.LG

적응형 베이지안 서브스페이스 최적화기를 이용한 강력하고 효율적인 제로차 LLM 미세 조정

Robust and Efficient Zeroth-Order LLM Fine-Tuning via Adaptive Bayesian Subspace Optimizer

Citations: 10

h-index: 2

Citations: 825

h-index: 16

제로차(ZO) 최적화를 사용하여 대규모 언어 모델(LLM)을 미세 조정하면 함수 평가를 통해 그래디언트를 근사하여 메모리 사용량을 줄일 수 있습니다. 그러나 기존 방법은 본질적으로 1차원 공간에서 업데이트를 수행하며, 낮은 정밀도 학습 환경에서 성능 저하나 심각한 성능 저하가 발생할 수 있습니다. 본 논문에서는 칼만 필터링을 사용하여 서브스페이스 내의 여러 방향에 걸쳐 유한 차분 정보를 결합하는 적응형 베이지안 서브스페이스 제로차 최적기인 BSZO를 소개합니다. BSZO는 각 유한 차분 측정을 노이즈가 있는 관찰로 처리하여 서브스페이스에 투영된 그래디언트에 대한 사후 분포를 구축하고 베이지안 추론을 통해 이를 업데이트하며, 잔차 기반의 적응형 메커니즘을 통해 노이즈 변동에 적응합니다. 이론적 분석 결과, BSZO는 표준 ZO 방법에 비해 $k/γ$ 배 더 빠른 수렴률을 보입니다. RoBERTa, Mistral, OPT 모델에 대한 실험 결과, BSZO는 다양한 작업에서 기준 방법보다 우수한 성능을 보였으며, 특히 OPT-13B 모델에서 최대 6.67%의 절대 평균 성능 향상을 달성했습니다. 또한 BSZO는 fp16/bf16 정밀도 환경에서도 안정적인 성능을 유지하며, MeZO와 유사한 수준의 메모리 사용량(1.00배 ~ 1.08배)을 유지합니다.

Original Abstract

Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations. However, existing methods essentially perform updates in a one-dimensional space, and suffer from collapse or substantial performance degradation under low-precision training. We introduce BSZO, an adaptive \textbf{B}ayesian \textbf{S}ubspace \textbf{Z}eroth-Order \textbf{O}ptimizer, which applies Kalman filtering to combine finite-difference information across multiple perturbation directions within a subspace. By treating each finite-difference measurement as a noisy observation, BSZO builds a posterior distribution over the subspace-projected gradient and updates it through Bayesian inference, with a residual-based adaptive mechanism to adapt to noise variations. Theoretical analysis shows that BSZO improves the convergence rate by a factor of $k/γ$ compared to standard ZO methods. Experiments on RoBERTa, Mistral, and OPT models show that BSZO outperforms the baselines across various tasks, achieving up to 6.67\% absolute average improvement on OPT-13B while remaining robust under fp16/bf16 precision and keeping memory usage close to inference-only baselines (1.00$\times$--1.08$\times$ of MeZO).

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!