2603.18567v1 Mar 19, 2026 cs.LG

SpecForge: 추론 디코딩을 위한 유연하고 효율적인 오픈 소스 학습 프레임워크

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Yineng Zhang

Citations: 379

h-index: 5

Ivor W. Tsang

Citations: 20

h-index: 2

Yonggang Wen

Citations: 362

h-index: 9

Jin Pan

Citations: 3,228

h-index: 4

Chao Wang

Citations: 24

h-index: 3

Yikai Zhu

Citations: 55

h-index: 3

Yubo Wang

Citations: 255

h-index: 4

Fan Yin

Citations: 227

h-index: 6

S. Shi

Citations: 0

h-index: 0

Ye Chen

Citations: 3

h-index: 1

Xiaomin Dong

Citations: 43

h-index: 2

Qiaoling Chen

Citations: 242

h-index: 7

Ji Li

Citations: 14

h-index: 3

Laixin Xie

Citations: 20

h-index: 2

Lei Yu

Citations: 87

h-index: 4

Tianwei Zhang

Citations: 15

h-index: 2

Shenggui Li

Citations: 11

h-index: 2

대규모 언어 모델은 순차적인 자기 회귀 디코딩으로 인해 높은 추론 지연 시간을 발생시킵니다. 추론 디코딩은 경량 드래프트 모델을 사용하여 배치 검증을 위한 여러 토큰을 제안하여 이러한 병목 현상을 완화합니다. 그러나 고품질 드래프트 모델과 확장 가능한 학습 인프라의 부족으로 인해 채택이 제한되었습니다. 본 논문에서는 EAGLE-3에 대한 완벽한 지원을 제공하는 오픈 소스, 실무 지향형 추론 디코딩 모델 학습 프레임워크인 SpecForge를 소개합니다. SpecForge는 목표-드래프트 분리, 하이브리드 병렬 처리, 최적화된 학습 커널 및 실무 수준의 추론 엔진과의 통합을 통해 Qwen3-235B-A22B의 EAGLE-3 학습 속도를 최대 9.9배 향상시킵니다. 또한, SpecForge를 사용하여 학습된 실무 수준의 EAGLE-3 드래프트 모델 모음인 SpecBundle을 공개합니다. SpecBundle은 체계적인 추론 디코딩 학습 방법을 통해 커뮤니티 내 고품질 드래프트의 부족 문제를 해결하며, 저희의 드래프트 모델은 SGLang에서 최대 4.48배의 전체적인 추론 속도 향상을 달성하여 SpecForge가 실제 추론 디코딩 배포를 위한 실용적인 기반임을 입증합니다.

Original Abstract

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B. In addition, we release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs. Through a systematic study of speculative decoding training recipes, SpecBundle addresses the scarcity of high-quality drafts in the community, and our draft models achieve up to 4.48x end-to-end inference speedup on SGLang, establishing SpecForge as a practical foundation for real-world speculative decoding deployment.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!