2604.01674v1 Apr 02, 2026 cs.AI

다양한 언어 모델은 통합될 수 있는가?

Can Heterogeneous Language Models Be Fused?

Liang He

Citations: 476

h-index: 13

Shilian Chen

Citations: 23

h-index: 3

Wen Wu

Citations: 6

h-index: 2

Xin Li

Citations: 29

h-index: 2

Qifeng Feng

Citations: 1

h-index: 1

Jie Zhou

Citations: 22

h-index: 2

Qin Chen

Citations: 796

h-index: 15

모델 병합은 여러 전문가 모델을 하나의 모델로 통합하는 것을 목표로 하며, 이는 앙상블 방식의 추론 시간 비용 없이 각 모델의 상호 보완적인 장점을 활용할 수 있도록 합니다. 최근 연구에서는 모든 원본 모델이 동일한 사전 학습 기반 모델에서 파생된 extit{동질적} 모델인 경우, 즉 동일한 파라미터 좌표 또는 호환 가능한 작업 벡터를 공유하는 경우 병합이 매우 효과적일 수 있다는 것을 보여주었습니다. 그러나 개방형 모델 생태계에서는 유용한 전문가 모델이 종종 Llama, Qwen, Mistral과 같은 서로 다른 계열에서 구축되는 경우가 많으므로 이러한 가정은 점점 더 비현실적이 됩니다. 이러한 extit{이질적} 환경에서는 아키텍처 불일치, 잠재 기반 정렬 불일치 및 증폭된 교차 소스 충돌로 인해 직접적인 가중치 공간 병합이 문제가 됩니다. 우리는 이 문제를 해결하기 위해 exttt{HeteroFusion}을 제안합니다. exttt{HeteroFusion}은 이질적인 언어 모델 병합을 위한 방법으로, 두 가지 주요 구성 요소로 이루어져 있습니다. 첫째, 토폴로지 기반 정렬은 기능적 모듈 구조를 일치시켜 원시 텐서 좌표 대신 다양한 기반 모델 간에 지식을 전달합니다. 둘째, 충돌 인지 노이즈 제거는 병합 과정에서 호환되지 않거나 노이즈가 많은 전달 신호를 억제합니다. 또한, 우리는 구조적 업데이트를 예측하면서 대상 어댑터 기반을 유지하는 것이 안정적이고 잘 정의된 전달 프로세스를 가능하게 한다는 분석적 근거를 제공합니다. exttt{HeteroFusion}은 이질적인 전이, 다중 소스 병합, 노이즈 원본에 대한 강건성 및 서로 다른 계열에 대한 일반화 설정에서 강력한 병합, 병합 및 앙상블 기준 모델보다 일관되게 우수한 성능을 보입니다.

Original Abstract

Model merging aims to integrate multiple expert models into a single model that inherits their complementary strengths without incurring the inference-time cost of ensembling. Recent progress has shown that merging can be highly effective when all source models are \emph{homogeneous}, i.e., derived from the same pretrained backbone and therefore share aligned parameter coordinates or compatible task vectors. Yet this assumption is increasingly unrealistic in open model ecosystems, where useful experts are often built on different families such as Llama, Qwen, and Mistral. In such \emph{heterogeneous} settings, direct weight-space fusion becomes ill-posed due to architectural mismatch, latent basis misalignment, and amplified cross-source conflict. We address this problem with \texttt{HeteroFusion} for heterogeneous language model fusion, which consists of two key components: topology-based alignment that transfers knowledge across heterogeneous backbones by matching functional module structures instead of raw tensor coordinates, and conflict-aware denoising that suppresses incompatible or noisy transfer signals during fusion. We further provide analytical justification showing that preserving the target adapter basis while predicting structured updates leads to a stable and well-conditioned transfer process. Across heterogeneous transfer, multi-source fusion, noisy-source robustness, and cross-family generalization settings, \texttt{HeteroFusion} consistently outperforms strong merging, fusion, and ensemble baselines.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!