2602.03120v1 Feb 03, 2026 cs.LG

양자화 진화 전략: 낮은 정밀도 비용으로 양자화된 LLM의 고정밀 미세 조정

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Yinggan Xu

Citations: 8

h-index: 2

Risto Miikkulainen

Citations: 70

h-index: 5

Xin Qiu

Citations: 36

h-index: 3

대규모 언어 모델(LLM)을 메모리 제약 환경에서 배포하기 위해 필수적인 후속 학습 양자화(PTQ)는 모델을 고정 상태로 만들고 미세 조정을 어렵게 합니다. 강화 학습(RL)을 포함한 표준 미세 조정 방식은 기본적으로 역전파와 고정밀 가중치를 사용하여 기울기를 계산합니다. 따라서 파라미터 공간이 이산적이고 미분 불가능한 양자화된 모델에서는 이러한 방식을 사용할 수 없습니다. 진화 전략(ES)은 역전파가 필요 없는 대안을 제공하지만, 양자화된 파라미터의 최적화는 여전히 기울기가 소실되거나 부정확하여 실패할 수 있습니다. 본 논문에서는 양자화된 공간에서 직접 전체 파라미터를 미세 조정하는 최적화 패러다임인 양자화 진화 전략(QES)을 소개합니다. QES는 다음 두 가지 혁신에 기반합니다. (1) 고정밀 기울기 신호를 유지하기 위해 누적 오류 피드백을 통합하고, (2) 메모리 사용량을 저정밀 추론 수준으로 줄이기 위해 상태 없는 시드 재생을 사용합니다. QES는 산술 추론 작업에서 최첨단 제로차 미세 조정 방법에 비해 상당한 성능 향상을 보여주어 양자화된 모델의 직접 미세 조정을 가능하게 합니다. 따라서 QES는 LLM을 완전히 양자화된 공간에서 확장할 수 있는 가능성을 열어줍니다. 소스 코드는 https://github.com/dibbla/Quantized-Evolution-Strategies 에서 확인할 수 있습니다.

Original Abstract

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

0 Citations

0 Influential

34.489476363992 Altmetric

172.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!