2601.09555v1 Jan 14, 2026 cs.CL

마이크로스케일링 부동소수점 형식을 사용한 대규모 언어 모델의 양자화 후 성능 벤치마킹

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Hui-Ling Zhen

Citations: 304

h-index: 6

Zhenhua Dong

Citations: 2

h-index: 1

Manyi Zhang

Citations: 12

h-index: 2

Ji-Fu Li

Citations: 10

h-index: 1

Zhongao Sun

Citations: 4

h-index: 1

Haoli Bai

Citations: 324

h-index: 7

Xianzhi Yu

Citations: 7

h-index: 1

마이크로스케일링 부동소수점(MXFP)은 대규모 언어 모델(LLM)을 위한 유망한 저정밀 형식으로 부상했습니다. 다양한 양자화 후 훈련(PTQ) 알고리즘이 제안되었지만, 대부분 정수 양자화에 초점을 맞추고 있으며, MXFP 형식에서의 적용 가능성과 동작 방식은 아직 충분히 연구되지 않았습니다. 본 연구에서는 MXFP 형식에서의 PTQ를 체계적으로 조사하며, 7가지 이상의 PTQ 알고리즘, 15개의 평가 벤치마크, 그리고 3개의 LLM 패밀리를 포함합니다. 주요 결과는 다음과 같습니다. 1) MXFP8은 일관되게 거의 손실 없는 성능을 달성하는 반면, MXFP4는 상당한 정확도 저하를 초래하며 여전히 어려운 과제입니다. 2) MXFP에서의 PTQ 효과는 형식 호환성에 크게 의존하며, 일부 알고리즘 패러다임은 다른 패러다임보다 일관되게 더 효과적입니다. 3) PTQ 성능은 모델 패밀리 및 모달리티에 걸쳐 매우 일관된 추세를 보이며, 특히 다중 모달 LLM에서 양자화 감수성은 비전 인코더보다는 언어 모델에 의해 지배됩니다. 4) MXFP4에서 양자화의 스케일링 계수는 중요한 오류 원인이며, 간단한 사전 스케일링 최적화 전략은 이러한 영향을 크게 완화할 수 있습니다. 이러한 결과들은 기존 PTQ 방법을 MXFP 양자화에 적용하기 위한 실질적인 지침을 제공합니다.

Original Abstract

Microscaling Floating-Point (MXFP) has emerged as a promising low-precision format for large language models (LLMs). Despite various post-training quantization (PTQ) algorithms being proposed, they mostly focus on integer quantization, while their applicability and behavior under MXFP formats remain largely unexplored. To address this gap, this work conducts a systematic investigation of PTQ under MXFP formats, encompassing over 7 PTQ algorithms, 15 evaluation benchmarks, and 3 LLM families. The key findings include: 1) MXFP8 consistently achieves near-lossless performance, while MXFP4 introduces substantial accuracy degradation and remains challenging; 2) PTQ effectiveness under MXFP depends strongly on format compatibility, with some algorithmic paradigms being consistently more effective than others; 3) PTQ performance exhibits highly consistent trends across model families and modalities, in particular, quantization sensitivity is dominated by the language model rather than the vision encoder in multimodal LLMs; 4) The scaling factor of quantization is a critical error source in MXFP4, and a simple pre-scale optimization strategy can significantly mitigate its impact. Together, these results provide practical guidance on adapting existing PTQ methods to MXFP quantization.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!