2603.09527v1 Mar 10, 2026 cs.LG

매개변수 및 데이터 효율적인 적응을 통한 효율적인 초안 모델 정렬

Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

Xuelong Li

Citations: 149

h-index: 5

Luxi Lin

Citations: 144

h-index: 2

Yuhao Chen

Citations: 58

h-index: 3

Zhihang Lin

Citations: 390

h-index: 6

Zhanpeng Zeng

Citations: 48

h-index: 3

Qingyu Zhang

Citations: 408

h-index: 3

Jixiang Luo

Citations: 169

h-index: 7

Rongrong Ji

Citations: 352

h-index: 9

사전 예측 디코딩은 LLM 추론 속도를 향상시키지만, 특정 도메인에 대해 미세 조정된 대상 모델의 경우 성능 저하가 발생합니다. 간단한 해결책은 모든 대상 모델에 대해 초안 모델을 재학습하는 것인데, 이는 비용이 많이 들고 비효율적입니다. 이를 해결하기 위해, 우리는 매개변수 및 데이터 효율적인 프레임워크인 Efficient Draft Adaptation(EDA)을 제안합니다. EDA는 다음 세 가지 혁신을 도입합니다. (1) 공유 및 대상 특정 출력 분포를 별도로 모델링하기 위해 공유 구성 요소와 개인 구성 요소를 활용하는 분리된 아키텍처를 사용하여, 경량 개인 구성 요소만 업데이트하여 매개변수 효율적인 적응을 가능하게 합니다. (2) 미세 조정된 대상 모델을 사용하여 학습 데이터를 재생성하는 데이터 재생성 전략을 사용하여, 학습과 사전 예측 디코딩 간의 정렬을 개선하고, 평균 수용 길이를 향상시킵니다. (3) 효율적인 적응을 위해 고가치 데이터를 우선적으로 선택하는 샘플 선택 메커니즘입니다. 우리의 실험 결과는 EDA가 미세 조정된 모델에서 사전 예측 성능을 효과적으로 복원하며, 전체 재학습에 비해 훨씬 낮은 학습 비용으로 우수한 평균 수용 길이를 달성한다는 것을 보여줍니다. 코드는 https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation 에서 확인할 수 있습니다.

Original Abstract

Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and private components to model the shared and target-specific output distributions separately, enabling parameter-efficient adaptation by updating only the lightweight private component;(2) a data regeneration strategy that utilizes the fine-tuned target model to regenerate training data, thereby improving the alignment between training and speculative decoding, leading to higher average acceptance length;(3) a sample selection mechanism that prioritizes high-value data for efficient adaptation. Our experiments show that EDA effectively restores speculative performance on fine-tuned models, achieving superior average acceptance lengths with significantly reduced training costs compared to full retraining. Code is available at https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation.

0 Citations

0 Influential

32.547189562171 Altmetric

162.7 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!