2602.01469v1 Feb 01, 2026 cs.LG

P-EAGLE: 확장 가능한 학습을 갖춘 병렬 추론 EAGLE

P-EAGLE: Parallel-Drafting EAGLE with Scalable Training

Yueqing Sun

Citations: 32

h-index: 4

Xin Huang

Citations: 123

h-index: 5

George Karypis

Citations: 1,147

h-index: 15

Xiang Song

Citations: 10

h-index: 2

Mude Hui

Citations: 491

h-index: 9

Jaime Campos Salas

Citations: 2

h-index: 1

A. Khetan

Citations: 1,779

h-index: 12

Nathan Pemberton

Citations: 714

h-index: 8

추론 LLM은 더 긴 출력을 생성하며, 이는 확장된 시퀀스로 훈련된 추론 디코더의 필요성을 야기합니다. 병렬 추론(단일 forward pass에서 여러 토큰을 예측하는 방식)은 순차적 생성 방식보다 지연 시간 측면에서 이점을 제공하지만, 훈련 복잡도는 시퀀스 길이와 병렬 위치의 곱에 따라 2차적으로 증가하여 장문 컨텍스트 훈련을 비현실적으로 만듭니다. 우리는 EAGLE을 자기 회귀 모델에서 학습 가능한 공유 hidden state를 통해 병렬 다중 토큰 예측 모델로 변환한 P(arallel)-EAGLE을 제시합니다. 장문 컨텍스트로의 훈련 확장을 위해, 우리는 attention mask의 사전 계산과 시퀀스 파티셔닝 기술을 특징으로 하는 프레임워크를 개발했습니다. 이를 통해 병렬 예측 훈련을 위한 개별 시퀀스 내에서 gradient accumulation을 가능하게 합니다. 우리는 P-EAGLE을 vLLM에 구현하고, GPT-OSS 120B, 20B 및 Qwen3-Coder 30B에서 자기 회귀 EAGLE-3에 비해 1.10-1.36배의 속도 향상을 달성했음을 보여줍니다.

Original Abstract

Reasoning LLMs produce longer outputs, requiring speculative decoding drafters trained on extended sequences. Parallel drafting - predicting multiple tokens per forward pass - offers latency benefits over sequential generation, but training complexity scales quadratically with the product of sequence length and parallel positions, rendering long-context training impractical. We present P(arallel)-EAGLE, which transforms EAGLE from autoregressive to parallel multi-token prediction via a learnable shared hidden state. To scale training to long contexts, we develop a framework featuring attention mask pre-computation and sequence partitioning techniques, enabling gradient accumulation within individual sequences for parallel-prediction training. We implement P-EAGLE in vLLM and demonstrate speedups of 1.10-1.36x over autoregressive EAGLE-3 across GPT-OSS 120B, 20B, and Qwen3-Coder 30B.

2 Citations

0 Influential

7.5 Altmetric

39.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!