2601.02780v2 Jan 06, 2026 cs.CL

MiMo-V2-Flash 기술 보고서

MiMo-V2-Flash Technical Report

Bofei Gao
Bofei Gao
Citations: 1,214
h-index: 6
Houbin Zhang
Houbin Zhang
Citations: 151
h-index: 3
Bo Yang
Bo Yang
Citations: 413
h-index: 8
Chun Chen
Chun Chen
Citations: 168
h-index: 4
Yizhao Gao
Yizhao Gao
Citations: 94
h-index: 2
Jianyu Wei
Jianyu Wei
Citations: 106
h-index: 3
Qihao Zhang
Qihao Zhang
Citations: 54
h-index: 3
Yu Cheng
Yu Cheng
Citations: 96
h-index: 2
Shimao Chen
Shimao Chen
Citations: 217
h-index: 6
Zheng-Yu Tang
Zheng-Yu Tang
Citations: 169
h-index: 5
Zi-Ang Jiang
Zi-Ang Jiang
Citations: 138
h-index: 3
Yi-Hao Song
Yi-Hao Song
Citations: 216
h-index: 5
Shijie Cao
Shijie Cao
Citations: 122
h-index: 5
Xi Xiao
Xi Xiao
Citations: 75
h-index: 4
Bing Xia
Bing Xia
Citations: 35
h-index: 1
Bowen Shen
Bowen Shen
Citations: 193
h-index: 4
Chen Zhang
Chen Zhang
Citations: 35
h-index: 1
Chenhong He
Chenhong He
Citations: 203
h-index: 5
Chiheng Lou
Chiheng Lou
Citations: 53
h-index: 4
Fuli Luo
Fuli Luo
Citations: 49
h-index: 3
Gang Wang
Gang Wang
Citations: 344
h-index: 6
Gang Xie
Gang Xie
Citations: 55
h-index: 4
Hailin Zhang
Hailin Zhang
Citations: 289
h-index: 5
Hanglong Lv
Hanglong Lv
Citations: 125
h-index: 3
Hanyu Li
Hanyu Li
Citations: 107
h-index: 4
Heyu Chen
Heyu Chen
Citations: 136
h-index: 3
Hong-Mei Xu
Hong-Mei Xu
Citations: 166
h-index: 4
Huaqiu Liu
Huaqiu Liu
Citations: 215
h-index: 5
Jiangshan Duo
Jiangshan Duo
Citations: 102
h-index: 3
Jiebao Xiao
Jiebao Xiao
Citations: 193
h-index: 4
Jinhao Dong
Jinhao Dong
Citations: 193
h-index: 4
Jun-Miao Shi
Jun-Miao Shi
Citations: 138
h-index: 3
J. Hu
J. Hu
Citations: 111
h-index: 2
Kainan Bao
Kainan Bao
Citations: 193
h-index: 4
Kang Zhou
Kang Zhou
Citations: 45
h-index: 3
Lei Li
Lei Li
Citations: 37
h-index: 2
Liang Zhao
Liang Zhao
Citations: 35
h-index: 1
Linghao Zhang
Linghao Zhang
Citations: 104
h-index: 3
Peidian Li
Peidian Li
Citations: 193
h-index: 4
Qian Chen
Qian Chen
Citations: 93
h-index: 2
Shao-yang Liu
Shao-yang Liu
Citations: 195
h-index: 4
Shi-liang Yu
Shi-liang Yu
Citations: 136
h-index: 4
Shouqiu Yu
Shouqiu Yu
Citations: 34
h-index: 1
Shuo Liu
Shuo Liu
Citations: 48
h-index: 2
Tian-Yu Zhou
Tian-Yu Zhou
Citations: 43
h-index: 2
Wei Su
Wei Su
Citations: 35
h-index: 1
Weikun Wang
Weikun Wang
Citations: 193
h-index: 4
Wenhan Ma
Wenhan Ma
Peking University
Citations: 219
h-index: 5
Xia Deng
Xia Deng
Citations: 197
h-index: 4
Bo Mao
Bo Mao
Citations: 39
h-index: 2
Bowen Ye
Bowen Ye
Citations: 211
h-index: 5
C. Cai
C. Cai
Citations: 198
h-index: 5
Chenghua Wang
Chenghua Wang
Citations: 46
h-index: 2
Chengxuan Zhu
Chengxuan Zhu
Citations: 92
h-index: 2
Chong Ma
Chong Ma
Citations: 92
h-index: 2
Chunan Li
Chunan Li
Citations: 92
h-index: 2
Dawei Zhu
Dawei Zhu
Citations: 92
h-index: 2
Deshan Xiao
Deshan Xiao
Citations: 40
h-index: 2
Dong Zhang
Dong Zhang
Citations: 147
h-index: 4
Duo Zhang
Duo Zhang
Citations: 193
h-index: 4
Fang Liu
Fang Liu
Citations: 38
h-index: 2
Feiyu Yang
Feiyu Yang
Citations: 37
h-index: 2
Feng Shi
Feng Shi
Citations: 48
h-index: 2
Guoan Wang
Guoan Wang
Citations: 196
h-index: 4
Hao Tian
Hao Tian
Citations: 148
h-index: 5
Hao Wu
Hao Wu
Citations: 60
h-index: 2
Hengxu Qu
Hengxu Qu
Citations: 210
h-index: 5
Hong Yi
Hong Yi
Citations: 83
h-index: 3
Hongxu An
Hongxu An
Citations: 36
h-index: 2
Hongyi Guan
Hongyi Guan
Citations: 64
h-index: 2
Xing Zhang
Xing Zhang
Citations: 92
h-index: 2
Yi-Tong Yan
Yi-Tong Yan
Citations: 34
h-index: 1
Yihao Zhao
Yihao Zhao
Citations: 198
h-index: 6
Ying Lai
Ying Lai
Citations: 56
h-index: 2
Yu Tian
Yu Tian
Citations: 36
h-index: 2
Yudong Wang
Yudong Wang
Citations: 34
h-index: 1
Zheng Wen
Zheng Wen
Citations: 50
h-index: 3
Zhichao Song
Zhichao Song
Citations: 199
h-index: 4
Zhixian Zheng
Zhixian Zheng
Citations: 124
h-index: 3
Jiantao Wen
Jiantao Wen
Citations: 40
h-index: 2
Jiarui Sun
Jiarui Sun
Citations: 42
h-index: 2
Jiawei Li
Jiawei Li
Citations: 234
h-index: 7
Jinlong Xue
Jinlong Xue
Citations: 92
h-index: 2
Jun Xia
Jun Xia
Citations: 34
h-index: 1
Kai Fang
Kai Fang
Citations: 226
h-index: 5
Menghang Zhu
Menghang Zhu
Citations: 193
h-index: 4
Nuo Chen
Nuo Chen
Citations: 34
h-index: 1
Qian Tu
Qian Tu
Citations: 49
h-index: 2
Qiying Wang
Qiying Wang
Citations: 140
h-index: 6
Rang Li
Rang Li
Citations: 127
h-index: 4
Rui Ma
Rui Ma
Citations: 138
h-index: 4
Shao-Qiang Zhang
Shao-Qiang Zhang
Citations: 38
h-index: 2
Shengfan Wang
Shengfan Wang
Citations: 99
h-index: 3
Shicheng Li
Shicheng Li
Citations: 207
h-index: 5
Shuhao Gu
Shuhao Gu
Citations: 209
h-index: 5
Shu-Yue Ren
Shu-Yue Ren
Citations: 37
h-index: 2
Sirui Deng
Sirui Deng
Citations: 198
h-index: 4
Tao Guo
Tao Guo
Citations: 99
h-index: 3
Tianyang Lu
Tianyang Lu
Citations: 41
h-index: 2
Weiji Zhuang
Weiji Zhuang
Citations: 224
h-index: 6
Weikang Zhang
Weikang Zhang
Citations: 108
h-index: 5
Weimin Xiong
Weimin Xiong
Citations: 123
h-index: 4
Wen-Jie Huang
Wen-Jie Huang
Citations: 37
h-index: 2
Wenyu Yang
Wenyu Yang
Citations: 93
h-index: 2
Xin Zhang
Xin Zhang
Citations: 37
h-index: 2
Xing Yong
Xing Yong
Citations: 194
h-index: 4
Xu Wang
Xu Wang
Citations: 193
h-index: 4
Xueyang Xie
Xueyang Xie
Citations: 40
h-index: 2
Yilin Jiang
Yilin Jiang
Citations: 98
h-index: 3
Yixin Yang
Yixin Yang
Citations: 92
h-index: 2
Yongzhe He
Yongzhe He
Citations: 92
h-index: 2
Yuanyu Tu
Yuanyu Tu
Citations: 35
h-index: 1
Yu-Jie Dong
Yu-Jie Dong
Citations: 36
h-index: 2
Yuchen Liu
Yuchen Liu
Citations: 50
h-index: 2
Yue Ma
Yue Ma
Citations: 105
h-index: 3
Yue Yu
Yue Yu
Citations: 215
h-index: 5
Yu-Cui Xiang
Yu-Cui Xiang
Citations: 55
h-index: 2
Zhaojun Huang
Zhaojun Huang
Citations: 94
h-index: 2
Zhenrui Lin
Zhenrui Lin
Citations: 203
h-index: 5
Zhipeng Xu
Zhipeng Xu
Citations: 34
h-index: 1
Zhiyang Chen
Zhiyang Chen
Citations: 36
h-index: 2
Zhonghua Deng
Zhonghua Deng
Citations: 56
h-index: 2
Zihan Zhang
Zihan Zhang
Citations: 99
h-index: 3
Zihao Yue
Zihao Yue
Citations: 96
h-index: 3

본 연구에서는 총 3090억 개의 파라미터와 150억 개의 활성 파라미터를 가진 Mixture-of-Experts (MoE) 모델인 MiMo-V2-Flash를 소개합니다. MiMo-V2-Flash는 빠르고 강력한 추론 능력과 에이전트 기능을 갖도록 설계되었습니다. 이 모델은 Sliding Window Attention (SWA)과 전역 어텐션을 결합한 하이브리드 어텐션 아키텍처를 채택하며, 5:1의 비율로 128 토큰의 슬라이딩 윈도우를 사용합니다. MiMo-V2-Flash는 Multi-Token Prediction (MTP)을 사용하여 27조 개의 토큰으로 사전 학습되었으며, 원래 32k의 컨텍스트 길이를 사용하고 이후 256k로 확장되었습니다. MiMo-V2-Flash는 학습 후 계산 비용을 효율적으로 줄이기 위해 새로운 Multi-Teacher On-Policy Distillation (MOPD) 패러다임을 도입했습니다. 이 프레임워크에서, 도메인 전문성을 가진 가상의 교사 모델(예: 대규모 강화 학습을 통해 훈련된 모델)은 학생 모델에게 밀집되고 토큰 수준의 보상을 제공하여, 학생 모델이 교사 모델의 전문성을 완벽하게 습득할 수 있도록 합니다. MiMo-V2-Flash는 DeepSeek-V3.2 및 Kimi-K2와 같은 최상위 오픈 소스 모델과 경쟁력을 갖추고 있으며, 각각 총 파라미터 수의 1/2 및 1/3만을 사용합니다. 추론 과정에서, MTP를 스펙티브 디코딩을 위한 초안 모델로 활용하여 MiMo-V2-Flash는 최대 3.6배의 수용 길이 증가와 2.6배의 디코딩 속도 향상을 달성했습니다. MiMo-V2-Flash의 모델 가중치와 3계층 MTP 가중치를 공개하여, 오픈 연구 및 커뮤니티 협력을 장려하고자 합니다.

Original Abstract

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

35 Citations
1 Influential
4 Altmetric
57.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!