2606.05682v1 Jun 04, 2026 cs.AI

Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

C. Liu
C. Liu
Citations: 46
h-index: 1
Fang Tu
Fang Tu
Citations: 3
h-index: 1
Srinivasan Manoharan
Srinivasan Manoharan
Citations: 1
h-index: 1
Junhua Zhao
Junhua Zhao
Citations: 50
h-index: 3
Xin Chen
Xin Chen
Citations: 468
h-index: 4
Haifeng Wu
Haifeng Wu
Citations: 7
h-index: 2
Jianmin Wan
Jianmin Wan
Citations: 876
h-index: 17

Demand for low-precision inference, including NVFP4-based approaches, has grown as large language models are increasingly deployed in latency and cost constrained production environments. Quantization-aware distillation (QAD) helps recover accuracy lost under low bit quantization by training a quantized student to match the output distribution of a frozen higher precision teacher via a KL-divergence loss. In this work, we first provide a representation level diagnosis of QAD: output matching alone can mask internal degradation, because many intermediate activation geometries can yield similar teacher-aligned logits. Using CKA, we show that KL-only QAD can reduce layerwise representational similarity relative to the BF16 teacher, with especially severe drift in RL-post-trained models. This drift correlates with downstream bottlenecks on reasoning and coding tasks, suggesting that low bit recovery requires preserving internal geometry rather than matching outputs alone. Motivated by this finding, we propose \textbf{CKA-QAD}, a CKA-guided representational alignment method for NVFP4 QAD and low bit LLM accuracy recovery. The method adds a lightweight regularizer that preserves internal representational geometry during distillation by aligning layerwise Gram matrices through CKA. Across Nemotron 3 Nano and Qwen3-4B-Thinking-2507, CKA-QAD substantially improves representational alignment and improves downstream reasoning and coding accuracy with modest training overhead. Our findings position CKA-guided representational alignment as a practical complement to output matching for quantized LLM recovery.

0 Citations
0 Influential
8.5 Altmetric
42.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!