2605.07111v1 May 08, 2026 cs.CL

LoRA와 전체 미세 조정의 한계를 넘어: LLM 적응을 위한 그래디언트 기반 최적화기 라우팅

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

Hao Tang

Citations: 25

h-index: 2

Xiuqi Zhu

Citations: 15

h-index: 1

Xinying Zhang

Citations: 1

h-index: 1

Boxun Li

Citations: 12

h-index: 1

Virginia Smith

Citations: 83

h-index: 3

Kevin Kuo

Citations: 55

h-index: 4

최근 대규모 언어 모델(LLM) 미세 조정 연구에서는 근본적인 논쟁이 제기되고 있습니다. 전체 미세 조정(FFT)은 높은 엔트로피 지식 주입에 필요한 표현적 유연성을 제공하지만, 저랭크 적응(LoRA)은 많은 작업이 저랭크 공간에서의 업데이트만 필요하고 LoRA의 추가적인 정규화로부터 이점을 얻기 때문에 FFT의 성능에 못지않거나 더 나은 결과를 보입니다. 우리는 다양한 작업(SQL, 의료 질의 응답, 반사실 지식)과 다양한 언어 모델(Gemma-3-1B, Qwen2.5-1.5B, Qwen2.5-3B)에 대한 실험적 평가를 통해 이러한 경향을 확인하고, 어느 한 가지 정적인 아키텍처에만 의존하는 것은 구조적으로 한계가 있음을 입증했습니다. 이러한 문제를 해결하기 위해, 우리는 LoRA와 전체 미세 조정을 결합한 통합 프레임워크인 Mixture of LoRA and Full (MoLF) 미세 조정을 제안합니다. MoLF는 학습 과정 전반에 걸쳐 FFT와 LoRA 간의 업데이트를 동적으로 라우팅하여, 정확한 그래디언트 신호가 모든 전문가에게 전달되도록 하여 안정적인 학습 동역을 보장합니다. 또한, 메모리 제약 환경을 위해, 기본 가중치를 고정하고 잠재적으로 다양한 랭크를 가진 LoRA 전문가 쌍 간에만 업데이트를 라우팅하는 MoLF-Efficient를 소개합니다. 우리의 평가는 MoLF가 모든 설정에서 FFT 및 LoRA 중 더 나은 성능을 보여주거나 1.5% 이내의 성능을 유지하며, MoLF-Efficient는 기존의 적응형 LoRA 접근 방식보다 사실 관련 작업에서 최대 20%, 의료 및 SQL 작업에서 최대 9% 더 뛰어난 성능을 보인다는 것을 보여줍니다.

Original Abstract

Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT performance because many tasks only require updates in a low-rank space and benefit from LoRA's additional regularization. Through empirical evaluation across diverse tasks (SQL, Medical QA, and Counterfactual Knowledge) and varying language models (Gemma-3-1B, Qwen2.5-1.5B, and Qwen2.5-3B), we verify both trends and demonstrate that relying solely on either static architecture is structurally limited. To address this challenge, we propose a Mixture of LoRA and Full (MoLF) Fine-Tuning, a unified framework that enables continuous navigation between both training regimes. MoLF dynamically routes updates between FFT and LoRA at the optimizer level to ensure that exact gradient signals are available to both experts throughout training, yielding stable training dynamics. For memory-constrained environments, we also introduce MoLF-Efficient, which freezes base weights and only routes updates among a pair of LoRA experts of potentially varying rank. Our evaluations show that MoLF either improves on or stays within $1.5\%$ of the better of FFT and LoRA across all settings, while MoLF-Efficient outperforms prior adaptive LoRA approaches by up to $20\%$ on Fact and $9\%$ on Med and SQL.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!