2605.05693v1 May 07, 2026 cs.AI

대규모 언어 모델을 위한 중요도 기반 정규화 양자화 교정

Saliency-Aware Regularized Quantization Calibration for Large Language Models

Yanlong Zhao

Citations: 24

h-index: 2

Xiaoyuan Cheng

Citations: 46

h-index: 4

Huihang Liu

Citations: 71

h-index: 4

Baihua He

Citations: 9

h-index: 2

Xinyu Zhang

Citations: 16

h-index: 2

H. Zhu

Citations: 16

h-index: 2

Wenlong Chen

Citations: 73

h-index: 4

LiuPing Zeng

Citations: 0

h-index: 0

Zhuo Sun

Citations: 3

h-index: 1

사후 훈련 양자화(PTQ)는 메모리 및 지연 시간 제약 조건 하에서 대규모 언어 모델(LLM)을 배포하는 효과적인 방법입니다. 대부분의 기존 PTQ 방법은 미리 결정된 교정 데이터 세트에 대한 계층별 재구성 오류를 최소화하여 양자화 매개변수를 결정하며, 일반적으로 스케일 검색 또는 Gram 기반 방법을 통해 최적화됩니다. 그러나 일반화 위험 관점에서 볼 때, 기존 PTQ 방법은 제한적이거나 대표성이 부족한 교정 데이터에 대한 경험적 재구성 오류에만 의존하여 양자화된 가중치가 원래 가중치에서 멀어지게 할 수 있습니다. 이는 일반화 위험을 증가시켜 다운스트림 성능을 저하시킬 수 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 표준 PTQ 목적 함수에 중요도 기반 정규화 항을 추가하는 통합 프레임워크인 "중요도 기반 정규화 양자화 교정(SARQC)"을 제안합니다. 이 정규화 항은 교정 과정에서 양자화된 가중치가 원래 가중치에 가깝게 유지되도록 장려하여 추론 과정에서 일반화 성능을 향상시킵니다. SARQC는 기존 PTQ 파이프라인에 원활하게 통합되어, 스케일 검색 및 Gram 기반 방법을 모두 통합된 방식으로 향상시킵니다. 밀집 및 Mixture-of-Experts LLM에 대한 광범위한 실험 결과, perplexity 및 제로샷 정확도에서 일관된 성능 향상이 확인되었으며, 추론 과정에서 추가적인 계산 오버헤드가 발생하지 않았습니다.

Original Abstract

Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a predetermined calibration dataset, usually optimized via either scale search or Gram-based methods. However, from the perspective of generalization risk, existing calibration objectives of PTQ based only on empirical reconstruction error on limited or unrepresentative calibration data could move the quantized weights away from the original weights. This may cause the generalization risk to diverge, potentially degrading downstream performance. To address this issue, we propose \emph{Saliency-Aware Regularized Quantization Calibration} (SARQC) a unified framework that augments the standard PTQ objective with a saliency-aware regularization term. This term encourages quantized weights to stay close to the original weights during calibration, leading to improved generalization during inference. SARQC integrates seamlessly into existing PTQ pipelines, enhancing both scale search and Gram-based methods under a unified formulation. Extensive experiments on dense and Mixture-of-Experts LLMs demonstrate consistent improvements in perplexity and zero-shot accuracy, without additional computational overhead during inference.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!