2604.12391v1 Apr 14, 2026 cs.CV

모델 체인 기반 사전 훈련: 비전 기반 모델의 훈련 가속화에 대한 새로운 접근

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Chao Li

Citations: 61

h-index: 4

Anbang Yao

Citations: 52

h-index: 3

Shigeng Wang

Citations: 65

h-index: 3

Xiaolong Liu

Citations: 55

h-index: 4

Jiawei Fan

Citations: 20,653

h-index: 72

본 논문에서는 비전 기반 모델(VFMs)의 훈련 가속화를 위한 새로운 방법인 '모델 체인 기반 사전 훈련(Chain-of-Models Pre-Training, CoM-PT)'을 제안합니다. CoM-PT는 기존의 훈련 가속화 방법과 근본적으로 다른 접근 방식으로, 개별 모델을 최적화하는 대신, 모델 패밀리 전체 수준에서 훈련 파이프라인을 가속화하도록 설계되었습니다. 이를 통해 모델 패밀리의 규모가 확장되더라도 효율적인 성능을 유지할 수 있습니다. 구체적으로, CoM-PT는 모델 크기가 증가하는 순서대로 배열된 모델 체인을 구성하여 사전 훈련 시퀀스를 정의합니다. 이 체인에서 가장 작은 모델은 표준 방식으로 개별 사전 훈련을 수행하고, 다른 모델들은 파라미터 공간과 특징 공간에서 이전 모델의 지식을 순차적으로 활용하여 효율적으로 훈련됩니다. 결과적으로, CoM-PT는 모든 모델이 표준 개별 훈련보다 우수한 성능을 달성하면서도 훈련 비용을 크게 절감하며, 이는 45개의 데이터 세트를 사용하여 수행된 광범위한 실험을 통해 검증되었습니다. 특히, CoM-PT의 효율적인 확장성은 주목할 만한 현상을 보여줍니다. 즉, 더 많은 모델을 훈련할수록 효율성이 더욱 향상됩니다. 예를 들어, CC3M 데이터 세트를 사용하여 사전 훈련을 수행할 때, i) ViT-L을 가장 큰 모델로 사용하는 경우, 모델 체인에 더 작은 모델을 추가하면 계산 복잡도를 최대 72%까지 줄일 수 있습니다. 또한, ii) 고정된 모델 크기 범위 내에서 VFM 패밀리가 3, 4, 7개의 모델로 확장됨에 따라, CoM-PT의 가속 비율이 4.13배에서 5.68배, 7.09배로 크게 증가합니다. CoM-PT는 특정 사전 훈련 방식에 독립적이기 때문에, 코드를 공개하여 대규모 언어 모델 사전 훈련과 같은 더욱 복잡한 시나리오에서 추가적인 연구 개발을 촉진하고자 합니다.

Original Abstract

In this paper, we present Chain-of-Models Pre-Training (CoM-PT), a novel performance-lossless training acceleration method for vision foundation models (VFMs). This approach fundamentally differs from existing acceleration methods in its core motivation: rather than optimizing each model individually, CoM-PT is designed to accelerate the training pipeline at the model family level, scaling efficiently as the model family expands. Specifically, CoM-PT establishes a pre-training sequence for the model family, arranged in ascending order of model size, called model chain. In this chain, only the smallest model undergoes standard individual pre-training, while the other models are efficiently trained through sequential inverse knowledge transfer from their smaller predecessors by jointly reusing the knowledge in the parameter space and the feature space. As a result, CoM-PT enables all models to achieve performance that is mostly superior to standard individual training while significantly reducing training cost, and this is extensively validated across 45 datasets spanning zero-shot and fine-tuning tasks. Notably, its efficient scaling property yields a remarkable phenomenon: training more models even results in higher efficiency. For instance, when pre-training on CC3M: i) given ViT-L as the largest model, progressively prepending smaller models to the model chain reduces computational complexity by up to 72%; ii) within a fixed model size range, as the VFM family scales across 3, 4, and 7 models, the acceleration ratio of CoM-PT exhibits a striking leap: from 4.13X to 5.68X and 7.09X. Since CoM-PT is naturally agnostic to specific pre-training paradigms, we open-source the code to spur further extensions in more computationally intensive scenarios, such as large language model pre-training.

0 Citations

0 Influential

30 Altmetric

150.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!