2603.23911v1 Mar 25, 2026 cs.CL

멀티 토큰 예측을 위한 자기 증류

Self-Distillation for Multi-Token Prediction

Guoliang Zhao

Citations: 2

h-index: 1

Ruobing Xie

Citations: 327

h-index: 10

An Wang

Citations: 13

h-index: 2

Shuaipeng Li

Citations: 153

h-index: 6

Huaibing Xie

Citations: 211

h-index: 2

Xingwu Sun

Citations: 354

h-index: 10

대규모 언어 모델(LLM)의 규모가 커짐에 따라, 추론 효율성은 중요한 병목 현상으로 작용합니다. 멀티 토큰 예측(MTP)은 여러 미래 토큰을 동시에 예측하여 LLM 추론 속도를 향상시킬 수 있습니다. 그러나 기존 MTP 방법은 여전히 두 가지 문제점에 직면합니다. 즉, MTP 헤드의 낮은 수용률과 여러 MTP 헤드를 동시에 학습시키는 어려움입니다. 따라서 본 논문에서는 최소한의 추가 학습 비용으로 MTP 헤드의 수용률을 향상시키고, 주요 헤드의 성능을 최대한 유지하는 간단하면서도 효과적인 자기 증류 방법인 MTP-D를 제안합니다. 또한, MTP-D의 확장 전략을 도입하여 MTP 헤드의 효율적인 확장을 가능하게 하고, 1-헤드 MTP의 추론 속도를 더욱 향상시켰습니다 (+220.4%). 또한, 다양한 벤치마크를 사용하여 증류 전략 및 MTP의 잠재적 확장성에 대한 주요 인사이트를 체계적으로 탐색하고 검증했습니다. 이러한 결과는 MTP-D 및 확장 전략이 MTP 헤드의 성능과 추론 효율성을 효과적으로 향상시켜 LLM에서 MTP의 실용적인 활용을 가능하게 한다는 것을 보여줍니다.

Original Abstract

As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference by predicting multiple future tokens in parallel. However, existing MTP approaches still face two challenges: limited acceptance rates of MTP heads, and difficulties in jointly training multiple MTP heads. Therefore, we propose MTP-D, a simple yet effective self-distillation method with minimal additional training cost, which boosts MTP head acceptance rates (+7.5\%) while maximumly preserving main-head performance. We also introduce a looped extension strategy for MTP-D, enabling effective and economical MTP head extension and further significant inference speedup to 1-head MTP (+220.4\%). Moreover, we systematically explore and validate key insights on the distillation strategies and the potential scalability of MTP through extensive experiments on seven benchmarks. These results demonstrate that our MTP-D and looped extension strategy effectively enhance MTP-head performance and inference efficiency, facilitating the practical usage of MTP in LLMs.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!