2603.18773v1 Mar 19, 2026 cs.LG

LLM 사후 훈련 파이프라인의 자동 구성

Automatic Configuration of LLM Post-Training Pipelines

Yao Lu

Citations: 4

h-index: 1

Channe Chwa

Citations: 1

h-index: 1

Xinle Wu

Citations: 215

h-index: 6

지도 학습 미세 조정과 강화 학습을 결합하는 LLM 사후 훈련 파이프라인은 현실적인 컴퓨팅 예산 하에서 구성하기 어렵습니다. 구성 공간은 고차원적이고 이질적이며, 각 단계는 강하게 결합되어 있으며, 전체 평가 비용이 많이 듭니다. 본 논문에서는 LLM 사후 훈련을 위한 예산 기반의 2단계 프레임워크인 AutoPipe를 제안합니다. AutoPipe는 오프라인에서 과거 실행 데이터를 기반으로 데이터셋에 따른 학습-순위 매기기 대리 모델을 학습하여, 데이터셋 내의 선호도를 파악하고 유망한 구성 공간 영역으로의 전달 가능한 지침을 제공합니다. 온라인에서는 새로운 데이터셋에 대해 AutoPipe가 오프라인 지침을 활용하여 베이지안 최적화를 수행하고, 가우시안 프로세스 잔차 대리 모델을 사용하여 데이터셋별 편차를 모델링합니다. 평가 비용을 줄이기 위해, 각 실험은 조기에 중단되고 학습된 예측기를 사용하여 초기 훈련 신호를 최종 사후 훈련 성능의 저렴한 대리 값으로 매핑합니다. 생물 의학 추론 작업에 대한 실험 결과, AutoPipe는 오프라인 기반 모델보다 일관되게 우수한 성능을 보이며, 가장 강력한 온라인 하이퍼파라미터 최적화 기반 모델과 비교 가능한 성능을 달성하면서도 10% 미만의 컴퓨팅 비용을 사용합니다.

Original Abstract

LLM post-training pipelines that combine supervised fine-tuning and reinforcement learning are difficult to configure under realistic compute budgets: the configuration space is high-dimensional and heterogeneous, stages are strongly coupled, and each end-to-end evaluation is expensive. We propose AutoPipe, a budget-aware two-stage framework for configuration selection in LLM post-training. Offline, AutoPipe learns a dataset-conditioned learning-to-rank surrogate from historical runs, capturing within-dataset preferences and providing transferable guidance toward promising regions of the configuration space. Online, for a new dataset, AutoPipe uses the offline guidance to steer Bayesian optimization and models dataset-specific deviations with a Gaussian-process residual surrogate. To reduce evaluation cost, each trial is early-stopped and scored by a learned predictor that maps early training signals to a low-cost proxy for final post-training performance. Experiments on biomedical reasoning tasks show that AutoPipe consistently outperforms offline-only baselines and achieves comparable performance with the strongest online HPO baselines while using less than 10\% of their computational cost.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!