2606.16456v1 Jun 15, 2026 cs.LG

SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling

Rui Mao
Rui Mao
Citations: 17
h-index: 2
Chen Xu
Chen Xu
Citations: 223
h-index: 7
Jingbo Zhu
Jingbo Zhu
Citations: 515
h-index: 11
Chunxiang Jin
Chunxiang Jin
Citations: 92
h-index: 4
Yuang Li
Yuang Li
Citations: 107
h-index: 3
Weiqiao Shan
Weiqiao Shan
Citations: 63
h-index: 4
Yuhao Zhang
Yuhao Zhang
Citations: 273
h-index: 9
Yingfeng Luo
Yingfeng Luo
Citations: 151
h-index: 7
Tong Zheng
Tong Zheng
Citations: 11
h-index: 1
Yuchen Qiao
Yuchen Qiao
Citations: 43
h-index: 3
Yingqing Yuan
Yingqing Yuan
Citations: 2
h-index: 1
Jingdong Chen
Jingdong Chen
Citations: 105
h-index: 6
Tong Xiao
Tong Xiao
Citations: 25
h-index: 3

Mixture-of-Experts (MoE) models enable efficient scaling, but training them from scratch remains prohibitively expensive. MoE upcycling mitigates this cost by converting pretrained dense models into sparse MoE models. However, existing upcycling methods typically rely on large-scale continued training and often perform poorly under data-constrained supervised adaptation, due to either homogeneous experts or overly disruptive perturbations to pretrained parameters. In this setting, effective upcycling must leverage pretrained weight structure while introducing sufficient diversity among routed experts. To this end, we propose SVD-Partitioned Residual Initialization (SPRI), which distributes SVD-partitioned residuals derived from pretrained feed-forward network (FFN) weights across routed experts, introducing controlled expert diversity grounded in pretrained spectral structure. We further introduce a two-stage training strategy to improve adaptation stability. We evaluate SPRI on multilingual speech-to-text translation, where limited supervised data challenges MoE upcycling and multiple target languages provide natural routing heterogeneity. On CoVoST2 across 15 En-to-XX directions, SPRI improves average BLEU and COMET over fully fine-tuned dense models by 2.58 and 3.32 points, respectively, and outperforms the prior best MoE upcycling baseline by 3.39 BLEU and 4.34 COMET points.

0 Citations
0 Influential
5.5 Altmetric
27.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!