2602.04937v1 Feb 04, 2026 cs.LG

선형 모델 병합을 통한 간단하고 확장 가능한 다중 모드 데이터 혼합 최적화

Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization

Davide Berasi

Citations: 7

h-index: 1

Matteo Farina

Citations: 65

h-index: 4

Massimiliano Mancini

Citations: 230

h-index: 8

Elisa Ricci

Citations: 143

h-index: 8

다중 모드 대규모 언어 모델의 성공적인 지도 미세 조정(Supervised Fine-Tuning, SFT)을 위해서는 최적의 데이터 혼합을 선택하는 것이 매우 중요합니다. 그러나 여러 도메인별 데이터 세트에 대한 최적의 혼합 가중치를 결정하는 것은 조합 탐색 공간의 크기와 단일 훈련 실행에도 발생하는 높은 비용으로 인해 중요한 병목 현상입니다. 이것을 데이터 혼합 최적화(Data Mixture Optimization, DMO) 문제라고 합니다. 반면, 모델 병합은 파라미터 보간을 통해 도메인별 전문가를 통합합니다. 이 전략은 각 도메인에 대해 단일 훈련 실행만 필요하므로 효율적이지만, 종종 최적의 모델을 얻지 못하는 경우가 많습니다. 본 연구에서는 이러한 두 가지 접근 방식의 장점을 결합하여, 모델 병합을 다양한 데이터 혼합의 성능을 추정하는 효율적인 전략으로 활용합니다. 우리는 도메인별 다중 모드 전문가 모델을 훈련하고, 이들의 가중 파라미터 공간 조합을 평가하여 해당 데이터 혼합의 효과를 추정합니다. 14개의 다중 모드 벤치마크에 대한 광범위한 실험을 통해, 병합된 프록시 모델이 실제 데이터 혼합으로 훈련된 모델과 높은 순위 상관관계를 보이는 것을 경험적으로 입증했습니다. 이를 통해 최적의 혼합을 찾는 과정을 리소스 집약적인 훈련 과정과 분리하여, 복잡한 혼합 가중치 공간을 탐색하는 확장 가능하고 효율적인 전략을 제공합니다. 코드는 다음 주소에서 공개적으로 이용할 수 있습니다: https://github.com/BerasiDavide/mLLMs_merging_4_DMO.

Original Abstract

Selecting the best data mixture is critical for successful Supervised Fine-Tuning (SFT) of Multimodal Large Language Models. However, determining the optimal mixture weights across multiple domain-specific datasets remains a significant bottleneck due to the combinatorial search space and the high cost associated with even a single training run. This is the so-called Data Mixture Optimization (DMO) problem. On the other hand, model merging unifies domain-specific experts through parameter interpolation. This strategy is efficient, as it only requires a single training run per domain, yet oftentimes leads to suboptimal models. In this work, we take the best of both worlds, studying model merging as an efficient strategy for estimating the performance of different data mixtures. We train domain-specific multimodal experts and evaluate their weighted parameter-space combinations to estimate the efficacy of corresponding data mixtures. We conduct extensive experiments on 14 multimodal benchmarks, and empirically demonstrate that the merged proxy models exhibit a high rank correlation with models trained on actual data mixtures. This decouples the search for optimal mixtures from the resource-intensive training process, thereby providing a scalable and efficient strategy for navigating the complex landscape of mixture weights. Code is publicly available at https://github.com/BerasiDavide/mLLMs_merging_4_DMO.

0 Citations

0 Influential

33.729550745277 Altmetric

168.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!