2602.05495v2 Feb 05, 2026 cs.CL

수송 및 병합: 대규모 언어 모델을 위한 교차 아키텍처 병합

Transport and Merge: Cross-Architecture Merging for Large Language Models

Yuxin Chen

Citations: 321

h-index: 7

Tat-Seng Chua

Citations: 2,957

h-index: 30

An Zhang

Citations: 131

h-index: 6

Jingnan Zheng

Citations: 93

h-index: 4

Chenhang Cui

Citations: 28

h-index: 3

Xiang Wang

Citations: 715

h-index: 14

Binyu Yang

Citations: 35

h-index: 2

Fei Shen

Citations: 3

h-index: 1

대규모 언어 모델(LLM)은 모델 용량과 학습 데이터를 확장하여 강력한 성능을 달성하지만, 실제 많은 애플리케이션에서는 더 작은 모델이 사용되며, 이들은 제한된 자원을 활용하여 학습되거나 조정됩니다. 이러한 간극은 대규모, 풍부한 자원을 가진 모델에서 소규모, 제한된 자원을 가진 모델로 지식을 전달하는 메커니즘의 필요성을 강조합니다. 모델 병합은 효과적인 지식 전달 메커니즘을 제공하지만, 대부분의 기존 방법은 아키텍처가 호환되는 모델을 전제로 하기 때문에, 대규모, 풍부한 자원을 가진 LLM에서 이질적인 소규모 모델로 직접적인 지식 전달이 어렵습니다. 본 연구에서는 최적 수송(OT)을 기반으로 하는 교차 아키텍처 병합 프레임워크를 제안합니다. 이 프레임워크는 활성화 값을 정렬하여 이질적인 모델 간의 신경 간 상호 대응 관계를 추론합니다. 결과적으로 생성된 수송 계획은 직접적인 가중치 공간 융합을 안내하여, 소량의 입력 데이터만 사용하여 효과적으로 풍부한 자원을 가진 모델의 지식을 제한된 자원을 가진 모델로 전달할 수 있도록 합니다. 다양한 저자원 언어 및 전문 분야에서의 광범위한 실험 결과, 제안하는 방법은 대상 모델의 성능을 꾸준히 향상시키는 것으로 나타났습니다.

Original Abstract

Large language models (LLMs) achieve strong capabilities by scaling model capacity and training data, yet many real-world deployments rely on smaller models trained or adapted from low-resource data. This gap motivates the need for mechanisms to transfer knowledge from large, high-resource models to smaller, low-resource targets. While model merging provides an effective transfer mechanism, most existing approaches assume architecture-compatible models and therefore cannot directly transfer knowledge from large high-resource LLMs to heterogeneous low-resource targets. In this work, we propose a cross-architecture merging framework based on optimal transport (OT) that aligns activations to infer cross-neuron correspondences between heterogeneous models. The resulting transport plans are then used to guide direct weight-space fusion, enabling effective high-resource to low-resource transfer using only a small set of inputs. Extensive experiments across low-resource languages and specialized domains demonstrate consistent improvements over target models.

0 Citations

0 Influential

15 Altmetric

75.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!