2605.27358v1 May 26, 2026 cs.LG

MobileMoE: Scaling On-Device Mixture of Experts

Ernie Chang
Ernie Chang
Citations: 511
h-index: 6
Zechun Liu
Zechun Liu
Citations: 2,171
h-index: 10
Raghuraman Krishnamoorthi
Raghuraman Krishnamoorthi
Citations: 5,227
h-index: 18
Yanbei Chen
Yanbei Chen
Citations: 333
h-index: 9
Han Huang
Han Huang
Citations: 3
h-index: 1
Jacob Szwejbka
Jacob Szwejbka
Citations: 0
h-index: 0
Digant Desai
Digant Desai
Citations: 31
h-index: 1
Vikas Chandra
Vikas Chandra
Citations: 1,463
h-index: 11

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, identifying an on-device sweet spot - moderate sparsity with fine-grained and shared experts - that is simultaneously memory and compute-optimal. Building on the derived architectures, we train MobileMoE with a four-stage recipe covering pre-training, mid-training, instruction fine-tuning, and quantization-aware training, all on open-source datasets. Across 14 benchmarks, MobileMoE matches or exceeds leading on-device dense LLMs with 2-4$\times$ fewer inference FLOPs, and matches or surpasses the state-of-the-art MoE OLMoE-1B-7B with up to 60% fewer parameters. To bridge the last mile to mobile deployment, we provide the first efficient MoE inference on commodity smartphones with comprehensive on-device profiling. At comparable INT4 weight memory, MobileMoE-S delivers $1.8$-$3.8\times$ faster prefill and $2.2$-$3.4\times$ faster decode than the dense baseline MobileLLM-Pro.

0 Citations
0 Influential
9 Altmetric
45.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!