2602.00170v1 Jan 30, 2026 cs.LG

LLM 미세 조정에서의 차원성의 축복: 분산-곡률 관점

The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature Perspective

Yizhou Liu

Citations: 159

h-index: 8

Jeff Gore

Citations: 53

h-index: 4

Risto Miikkulainen

Citations: 70

h-index: 5

Xin Qiu

Citations: 36

h-index: 3

Qiyao Liang

Citations: 44

h-index: 3

Jinyeop Song

Citations: 87

h-index: 5

I. Fiete

Citations: 5,976

h-index: 36

가중치 섭동 진화 전략(ES)은 놀라울 정도로 작은 개체군(예: N ≈ 30)을 사용하여 수십억 개의 파라미터를 가진 언어 모델을 미세 조정할 수 있으며, 이는 고전적인 0차 근사 차원성 저주에 대한 직관에 반합니다. 또한, 우리는 또 다른 뚜렷하게 분리된 현상을 관찰했습니다. 고정된 하이퍼파라미터 조건에서, 확률적 미세 조정 보상은 ES와 GRPO 모두에서 종종 상승하고, 최고점에 도달한 후, 저하되는 경향을 보입니다. 우리는 이러한 두 가지 효과가 미세 조정 환경의 공유된 기하학적 특성을 반영한다고 주장합니다. 즉, 이들은 곡률이 낮은 차원을 가집니다. 개선을 주도하는 것은 소수의 높은 곡률을 가진 차원이며, 이는 (i) 최소한의 이차 확률적 상승 모델에 의해 설명되는 고정된 확률성을 가지는 경우, 상승 후 저하되는 이질적인 시간 척도를 생성하고, (ii) 많은 무작위 섭동이 이러한 방향을 따라 유사한 구성 요소를 공유하는 퇴화된 개선 업데이트를 초래합니다. GSM8K, ARC-C 및 WinoGrande 데이터셋에 대한 Qwen2.5-Instruct 모델(0.5B--7B)의 미세 조정 보상 환경을 ES를 사용하여 기하학적 탐침기로 활용한 결과, 보상을 향상시키는 섭동이 다양한 규모에서 작은 개체군으로도 경험적으로 접근 가능하다는 것을 보여줍니다. 이러한 결과는 ES의 확장성을 비선형적인 학습 역학 관계와 조화시키고, 고차원 미세 조정이 최악의 경우 시나리오 이론에서 암시하는 것보다 더 광범위한 최적화 방법을 허용할 수 있음을 시사합니다.

Original Abstract

Weight-perturbation evolution strategies (ES) can fine-tune billion-parameter language models with surprisingly small populations (e.g., $N\!\approx\!30$), contradicting classical zeroth-order curse-of-dimensionality intuition. We also observe a second seemingly separate phenomenon: under fixed hyperparameters, the stochastic fine-tuning reward often rises, peaks, and then degrades in both ES and GRPO. We argue that both effects reflect a shared geometric property of fine-tuning landscapes: they are low-dimensional in curvature. A small set of high-curvature dimensions dominates improvement, producing (i) heterogeneous time scales that yield rise-then-decay under fixed stochasticity, as captured by a minimal quadratic stochastic-ascent model, and (ii) degenerate improving updates, where many random perturbations share similar components along these directions. Using ES as a geometric probe on fine-tuning reward landscapes of GSM8K, ARC-C, and WinoGrande across Qwen2.5-Instruct models (0.5B--7B), we show that reward-improving perturbations remain empirically accessible with small populations across scales. Together, these results reconcile ES scalability with non-monotonic training dynamics and suggest that high-dimensional fine-tuning may admit a broader class of viable optimization methods than worst-case theory implies.

2 Citations

0 Influential

18 Altmetric

92.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!