2602.07596v1 Feb 07, 2026 cs.LG

Astro: 활성화 기반 구조화된 정규화를 통한 이상치에 강건한 LLM 사후 학습 양자화

Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization

Xi Chen

Citations: 7

h-index: 1

Ming Li

Citations: 1

h-index: 1

Junxi Li

Citations: 5

h-index: 1

Changsheng Li

Citations: 144

h-index: 5

Peisong Wang

Citations: 342

h-index: 7

Lizhong Ding

Citations: 0

h-index: 0

Ye Yuan

Citations: 170

h-index: 6

Guoren Wang

Citations: 8,890

h-index: 44

가중치 기반 사후 학습 양자화(PTQ)는 효율적인 대규모 언어 모델(LLM) 배포에 필수적이지만, 가중치 및 활성화 값의 이상치로 인해 정확도가 저하되는 문제가 있습니다. 기존의 완화 전략들은 종종 심각한 한계를 가지고 있습니다. 즉, 이상치 억제 효과가 미흡하거나, 추론 지연, 과도한 전처리 또는 복잡한 연산자 융합과 같은 상당한 배포 비효율성을 초래합니다. 이러한 한계를 해결하기 위해, 우리는 중요한 통찰력을 활용합니다. 즉, 과적합된 LLM은 종종 '평탄한 최소점(Flat Minima)'에 수렴하며, 이는 정확도를 손상시키지 않고 가중치를 조정할 수 있는 광범위한 동등한 해 공간을 의미합니다. 이를 바탕으로, 우리는 하드웨어 친화적이고 효율적인 방식으로 이상치의 부정적인 영향을 억제하도록 설계된 활성화 기반 구조화된 정규화 프레임워크인 Astro를 제안합니다. Astro는 활성화 기반 정규화 목표를 활용하여, 모델 정확도를 손실시키지 않고, 높은 크기의 활성화 값에 해당하는 가중치 이상치를 적극적으로 억제하면서, 본질적으로 안정적인 가중치를 재구성합니다. 특히, Astro는 0의 추론 지연을 가지며, GPTQ와 같은 주류 양자화 방법과 상호 보완적입니다. 광범위한 실험 결과, Astro는 매우 경쟁력 있는 성능을 달성합니다. 특히, LLaMA-2-7B의 경우, Astro는 거의 1/3의 양자화 시간으로 복잡한 학습 기반 회전 방법보다 더 나은 성능을 보여줍니다.

Original Abstract

Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Existing mitigation strategies often face critical limitations: they either yield insufficient outlier suppression or incur significant deployment inefficiencies, such as inference latency, heavy preprocessing, or reliance on complex operator fusion. To resolve these limitations, we leverage a key insight: over-parameterized LLMs often converge to Flat Minima, implying a vast equivalent solution space where weights can be adjusted without compromising accuracy. Building on this, we propose Astro, an Activation-guided Structured Regularization framework designed to suppress the negative effects of outliers in a hardware-friendly and efficient manner. Leveraging the activation-guided regularization objective, Astro actively reconstructs intrinsically robust weights, aggressively suppressing weight outliers corresponding to high-magnitude activations without sacrificing model accuracy. Crucially, Astro introduces zero inference latency and is orthogonal to mainstream quantization methods like GPTQ. Extensive experiments show that Astro achieves highly competitive performance; notably, on LLaMA-2-7B, it achieves better performance than complex learning-based rotation methods with almost 1/3 of the quantization time.

0 Citations

0 Influential

22 Altmetric

110.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!