2601.02780v2 Jan 06, 2026 cs.CL

MiMo-V2-Flash 기술 보고서

MiMo-V2-Flash Technical Report

Bofei Gao
Bofei Gao
Citations: 1,036
h-index: 6
Houbin Zhang
Houbin Zhang
Citations: 108
h-index: 3
Bo Yang
Bo Yang
Citations: 338
h-index: 7
Chun Chen
Chun Chen
Citations: 117
h-index: 4
Yizhao Gao
Yizhao Gao
Citations: 51
h-index: 2
Jianyu Wei
Jianyu Wei
Citations: 63
h-index: 3
Qihao Zhang
Qihao Zhang
Citations: 31
h-index: 3
Yu Cheng
Yu Cheng
Citations: 53
h-index: 2
Shimao Chen
Shimao Chen
Citations: 151
h-index: 6
Zheng-Yu Tang
Zheng-Yu Tang
Citations: 101
h-index: 4
Zi-Ang Jiang
Zi-Ang Jiang
Citations: 95
h-index: 3
Yi-Hao Song
Yi-Hao Song
Citations: 146
h-index: 5
Shijie Cao
Shijie Cao
Citations: 75
h-index: 4
Xi Xiao
Xi Xiao
Citations: 52
h-index: 4
Bing Xia
Bing Xia
Citations: 15
h-index: 1
Bowen Shen
Bowen Shen
Citations: 131
h-index: 4
Chen Zhang
Chen Zhang
Citations: 15
h-index: 1
Chenhong He
Chenhong He
Citations: 141
h-index: 5
Chiheng Lou
Chiheng Lou
Citations: 26
h-index: 3
Fuli Luo
Fuli Luo
Citations: 21
h-index: 2
Gang Wang
Gang Wang
Citations: 266
h-index: 6
Gang Xie
Gang Xie
Citations: 27
h-index: 3
Hailin Zhang
Hailin Zhang
Citations: 230
h-index: 5
Hanglong Lv
Hanglong Lv
Citations: 71
h-index: 3
Hanyu Li
Hanyu Li
Citations: 64
h-index: 4
Heyu Chen
Heyu Chen
Citations: 92
h-index: 3
Hong-Mei Xu
Hong-Mei Xu
Citations: 116
h-index: 4
Huaqiu Liu
Huaqiu Liu
Citations: 152
h-index: 5
Jiangshan Duo
Jiangshan Duo
Citations: 61
h-index: 3
Jiebao Xiao
Jiebao Xiao
Citations: 131
h-index: 4
Jinhao Dong
Jinhao Dong
Citations: 131
h-index: 4
Jun-Miao Shi
Jun-Miao Shi
Citations: 97
h-index: 3
J. Hu
J. Hu
Citations: 83
h-index: 2
Kainan Bao
Kainan Bao
Citations: 131
h-index: 4
Kang Zhou
Kang Zhou
Citations: 22
h-index: 2
Lei Li
Lei Li
Citations: 18
h-index: 2
Liang Zhao
Liang Zhao
Citations: 16
h-index: 1
Linghao Zhang
Linghao Zhang
Citations: 56
h-index: 3
Peidian Li
Peidian Li
Citations: 131
h-index: 4
Qian Chen
Qian Chen
Citations: 52
h-index: 2
Shao-yang Liu
Shao-yang Liu
Citations: 133
h-index: 4
Shi-liang Yu
Shi-liang Yu
Citations: 81
h-index: 4
Shouqiu Yu
Shouqiu Yu
Citations: 15
h-index: 1
Shuo Liu
Shuo Liu
Citations: 26
h-index: 2
Tian-Yu Zhou
Tian-Yu Zhou
Citations: 22
h-index: 2
Wei Su
Wei Su
Citations: 16
h-index: 1
Weikun Wang
Weikun Wang
Citations: 131
h-index: 4
Wenhan Ma
Wenhan Ma
Citations: 115
h-index: 4
Xia Deng
Xia Deng
Citations: 134
h-index: 4
Bo Mao
Bo Mao
Citations: 20
h-index: 2
Bowen Ye
Bowen Ye
Citations: 143
h-index: 5
C. Cai
C. Cai
Citations: 136
h-index: 5
Chenghua Wang
Chenghua Wang
Citations: 26
h-index: 2
Chengxuan Zhu
Chengxuan Zhu
Citations: 51
h-index: 2
Chong Ma
Chong Ma
Citations: 51
h-index: 2
Chunan Li
Chunan Li
Citations: 51
h-index: 2
Dawei Zhu
Dawei Zhu
Citations: 69
h-index: 2
Deshan Xiao
Deshan Xiao
Citations: 21
h-index: 2
Dong Zhang
Dong Zhang
Citations: 103
h-index: 4
Duo Zhang
Duo Zhang
Citations: 131
h-index: 4
Fang Liu
Fang Liu
Citations: 18
h-index: 2
Feiyu Yang
Feiyu Yang
Citations: 18
h-index: 2
Feng Shi
Feng Shi
Citations: 28
h-index: 2
Guoan Wang
Guoan Wang
Citations: 134
h-index: 4
Hao Tian
Hao Tian
Citations: 84
h-index: 5
Hao Wu
Hao Wu
Citations: 38
h-index: 2
Hengxu Qu
Hengxu Qu
Citations: 142
h-index: 5
Hong Yi
Hong Yi
Citations: 62
h-index: 3
Hongxu An
Hongxu An
Citations: 16
h-index: 1
Hongyi Guan
Hongyi Guan
Citations: 43
h-index: 2
Xing Zhang
Xing Zhang
Citations: 52
h-index: 2
Yi-Tong Yan
Yi-Tong Yan
Citations: 15
h-index: 1
Yihao Zhao
Yihao Zhao
Citations: 154
h-index: 6
Ying Lai
Ying Lai
Citations: 33
h-index: 2
Yu Tian
Yu Tian
Citations: 17
h-index: 2
Yudong Wang
Yudong Wang
Citations: 15
h-index: 1
Zheng Wen
Zheng Wen
Citations: 27
h-index: 3
Zhichao Song
Zhichao Song
Citations: 137
h-index: 4
Zhixian Zheng
Zhixian Zheng
Citations: 71
h-index: 3
Jiantao Wen
Jiantao Wen
Citations: 20
h-index: 2
Jiarui Sun
Jiarui Sun
Citations: 23
h-index: 2
Jiawei Li
Jiawei Li
Citations: 212
h-index: 7
Jinlong Xue
Jinlong Xue
Citations: 51
h-index: 2
Jun Xia
Jun Xia
Citations: 15
h-index: 1
Kai Fang
Kai Fang
Citations: 168
h-index: 5
Menghang Zhu
Menghang Zhu
Citations: 131
h-index: 4
Nuo Chen
Nuo Chen
Citations: 15
h-index: 1
Qian Tu
Qian Tu
Citations: 24
h-index: 2
Qiying Wang
Qiying Wang
Citations: 107
h-index: 5
Rang Li
Rang Li
Citations: 74
h-index: 3
Rui Ma
Rui Ma
Citations: 82
h-index: 4
Shao-Qiang Zhang
Shao-Qiang Zhang
Citations: 19
h-index: 2
Shengfan Wang
Shengfan Wang
Citations: 57
h-index: 3
Shicheng Li
Shicheng Li
Citations: 141
h-index: 5
Shuhao Gu
Shuhao Gu
Citations: 138
h-index: 5
Shu-Yue Ren
Shu-Yue Ren
Citations: 18
h-index: 2
Sirui Deng
Sirui Deng
Citations: 136
h-index: 4
Tao Guo
Tao Guo
Citations: 57
h-index: 3
Tianyang Lu
Tianyang Lu
Citations: 21
h-index: 2
Weiji Zhuang
Weiji Zhuang
Citations: 150
h-index: 5
Weikang Zhang
Weikang Zhang
Citations: 81
h-index: 5
Weimin Xiong
Weimin Xiong
Citations: 76
h-index: 4
Wen-Jie Huang
Wen-Jie Huang
Citations: 17
h-index: 2
Wenyu Yang
Wenyu Yang
Citations: 52
h-index: 2
Xin Zhang
Xin Zhang
Citations: 18
h-index: 2
Xing Yong
Xing Yong
Citations: 132
h-index: 4
Xu Wang
Xu Wang
Citations: 131
h-index: 4
Xueyang Xie
Xueyang Xie
Citations: 21
h-index: 2
Yilin Jiang
Yilin Jiang
Citations: 57
h-index: 3
Yixin Yang
Yixin Yang
Citations: 51
h-index: 2
Yongzhe He
Yongzhe He
Citations: 51
h-index: 2
Yuanyu Tu
Yuanyu Tu
Citations: 15
h-index: 1
Yu-Jie Dong
Yu-Jie Dong
Citations: 17
h-index: 2
Yuchen Liu
Yuchen Liu
Citations: 30
h-index: 2
Yue Ma
Yue Ma
Citations: 59
h-index: 3
Yue Yu
Yue Yu
Citations: 151
h-index: 5
Yu-Cui Xiang
Yu-Cui Xiang
Citations: 35
h-index: 2
Zhaojun Huang
Zhaojun Huang
Citations: 51
h-index: 2
Zhenrui Lin
Zhenrui Lin
Citations: 139
h-index: 5
Zhipeng Xu
Zhipeng Xu
Citations: 15
h-index: 1
Zhiyang Chen
Zhiyang Chen
Citations: 15
h-index: 1
Zhonghua Deng
Zhonghua Deng
Citations: 37
h-index: 2
Zihan Zhang
Zihan Zhang
Citations: 57
h-index: 3
Zihao Yue
Zihao Yue
Citations: 52
h-index: 2

본 연구에서는 총 3090억 개의 파라미터와 150억 개의 활성 파라미터를 가진 Mixture-of-Experts (MoE) 모델인 MiMo-V2-Flash를 소개합니다. MiMo-V2-Flash는 빠르고 강력한 추론 능력과 에이전트 기능을 갖도록 설계되었습니다. 이 모델은 Sliding Window Attention (SWA)과 전역 어텐션을 결합한 하이브리드 어텐션 아키텍처를 채택하며, 5:1의 비율로 128 토큰의 슬라이딩 윈도우를 사용합니다. MiMo-V2-Flash는 Multi-Token Prediction (MTP)을 사용하여 27조 개의 토큰으로 사전 학습되었으며, 원래 32k의 컨텍스트 길이를 사용하고 이후 256k로 확장되었습니다. MiMo-V2-Flash는 학습 후 계산 비용을 효율적으로 줄이기 위해 새로운 Multi-Teacher On-Policy Distillation (MOPD) 패러다임을 도입했습니다. 이 프레임워크에서, 도메인 전문성을 가진 가상의 교사 모델(예: 대규모 강화 학습을 통해 훈련된 모델)은 학생 모델에게 밀집되고 토큰 수준의 보상을 제공하여, 학생 모델이 교사 모델의 전문성을 완벽하게 습득할 수 있도록 합니다. MiMo-V2-Flash는 DeepSeek-V3.2 및 Kimi-K2와 같은 최상위 오픈 소스 모델과 경쟁력을 갖추고 있으며, 각각 총 파라미터 수의 1/2 및 1/3만을 사용합니다. 추론 과정에서, MTP를 스펙티브 디코딩을 위한 초안 모델로 활용하여 MiMo-V2-Flash는 최대 3.6배의 수용 길이 증가와 2.6배의 디코딩 속도 향상을 달성했습니다. MiMo-V2-Flash의 모델 가중치와 3계층 MTP 가중치를 공개하여, 오픈 연구 및 커뮤니티 협력을 장려하고자 합니다.

Original Abstract

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

20 Citations
1 Influential
3.5 Altmetric
39.5 Score

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!