2601.02780v2 Jan 06, 2026 cs.CL

MiMo-V2-Flash 기술 보고서

MiMo-V2-Flash Technical Report

Bofei Gao

Citations: 1,456

h-index: 6

Houbin Zhang

Citations: 215

h-index: 3

Bo Yang

Citations: 538

h-index: 8

Chun Chen

Citations: 244

h-index: 4

Yizhao Gao

Citations: 160

h-index: 3

Jianyu Wei

Citations: 171

h-index: 4

Qihao Zhang

Citations: 95

h-index: 4

Yu Cheng

Citations: 162

h-index: 3

Shimao Chen

Citations: 302

h-index: 7

Zheng-Yu Tang

Citations: 272

h-index: 5

Zi-Ang Jiang

Citations: 196

h-index: 4

Yi-Hao Song

Citations: 318

h-index: 5

Shijie Cao

Citations: 203

h-index: 6

Xi Xiao

Citations: 121

h-index: 4

Bing Xia

Citations: 72

h-index: 2

Bowen Shen

Citations: 274

h-index: 4

Chen Zhang

Citations: 76

h-index: 2

Chenhong He

Citations: 284

h-index: 5

Chiheng Lou

Citations: 92

h-index: 4

Fuli Luo

Citations: 97

h-index: 3

Gang Wang

Citations: 454

h-index: 7

Gang Xie

Citations: 94

h-index: 4

Hailin Zhang

Citations: 384

h-index: 5

Hanglong Lv

Citations: 214

h-index: 4

Hanyu Li

Citations: 171

h-index: 4

Heyu Chen

Citations: 202

h-index: 3

Hong-Mei Xu

Citations: 244

h-index: 4

Huaqiu Liu

Citations: 296

h-index: 5

Jiangshan Duo

Citations: 167

h-index: 3

Jiebao Xiao

Citations: 274

h-index: 4

Jinhao Dong

Citations: 274

h-index: 4

Jun-Miao Shi

Citations: 193

h-index: 3

J. Hu

Citations: 154

h-index: 2

Kainan Bao

Citations: 274

h-index: 4

Kang Zhou

Citations: 82

h-index: 3

Lei Li

Citations: 74

h-index: 2

Liang Zhao

Citations: 71

h-index: 1

Linghao Zhang

Citations: 176

h-index: 3

Peidian Li

Citations: 274

h-index: 4

Qian Chen

Citations: 156

h-index: 2

Shao-yang Liu

Citations: 276

h-index: 4

Shi-liang Yu

Citations: 201

h-index: 4

Shouqiu Yu

Citations: 70

h-index: 1

Shuo Liu

Citations: 84

h-index: 2

Tian-Yu Zhou

Citations: 81

h-index: 2

Wei Su

Citations: 71

h-index: 1

Weikun Wang

Citations: 274

h-index: 4

Wenhan Ma

Peking University

Citations: 318

h-index: 5

Xia Deng

Citations: 280

h-index: 5

Bo Mao

Citations: 77

h-index: 2

Bowen Ye

Citations: 321

h-index: 6

C. Cai

Citations: 280

h-index: 5

Chenghua Wang

Citations: 84

h-index: 3

Chengxuan Zhu

Citations: 154

h-index: 2

Chunan Li

Citations: 154

h-index: 2

Dawei Zhu

Citations: 133

h-index: 2

Deshan Xiao

Citations: 76

h-index: 2

Dong Zhang

Citations: 203

h-index: 4

Duo Zhang

Citations: 274

h-index: 4

Fang Liu

Citations: 74

h-index: 2

Feiyu Yang

Citations: 73

h-index: 2

Feng Shi

Citations: 84

h-index: 2

Guoan Wang

Citations: 277

h-index: 4

Hao Tian

Citations: 228

h-index: 5

Hao Wu

Citations: 102

h-index: 2

Hengxu Qu

Citations: 303

h-index: 5

Hong Yi

Citations: 125

h-index: 3

Hongxu An

Citations: 76

h-index: 2

Hongyi Guan

Citations: 101

h-index: 2

Xing Zhang

Citations: 154

h-index: 2

Yi-Tong Yan

Citations: 70

h-index: 1

Ying Lai

Citations: 96

h-index: 2

Yu Tian

Citations: 72

h-index: 2

Yudong Wang

Citations: 70

h-index: 1

Zheng Wen

Citations: 94

h-index: 3

Zhichao Song

Citations: 281

h-index: 5

Zhixian Zheng

Citations: 189

h-index: 3

Jiantao Wen

Citations: 77

h-index: 2

Jiarui Sun

Citations: 80

h-index: 2

Jiawei Li

Citations: 277

h-index: 7

Jinlong Xue

Citations: 154

h-index: 2

Jun Xia

Citations: 70

h-index: 1

Kai Fang

Citations: 300

h-index: 5

Menghang Zhu

Citations: 274

h-index: 4

Nuo Chen

Citations: 71

h-index: 1

Qian Tu

Citations: 88

h-index: 2

Qiying Wang

Citations: 186

h-index: 6

Rang Li

Citations: 222

h-index: 5

Rui Ma

Citations: 208

h-index: 5

Shao-Qiang Zhang

Citations: 74

h-index: 2

Shengfan Wang

Citations: 161

h-index: 3

Shicheng Li

Citations: 291

h-index: 5

Shuhao Gu

Citations: 303

h-index: 5

Shu-Yue Ren

Citations: 73

h-index: 2

Sirui Deng

Citations: 279

h-index: 4

Tao Guo

Citations: 162

h-index: 3

Tianyang Lu

Citations: 78

h-index: 2

Weiji Zhuang

Citations: 325

h-index: 7

Weikang Zhang

Citations: 151

h-index: 5

Weimin Xiong

Citations: 194

h-index: 4

Wen-Jie Huang

Citations: 73

h-index: 2

Wenyu Yang

Citations: 155

h-index: 2

Xin Zhang

Citations: 73

h-index: 2

Xing Yong

Citations: 275

h-index: 4

Xu Wang

Citations: 274

h-index: 4

Xueyang Xie

Citations: 76

h-index: 2

Yilin Jiang

Citations: 160

h-index: 3

Yixin Yang

Citations: 154

h-index: 2

Yongzhe He

Citations: 154

h-index: 2

Yuanyu Tu

Citations: 72

h-index: 2

Yu-Jie Dong

Citations: 74

h-index: 2

Yuchen Liu

Citations: 88

h-index: 2

Yue Ma

Citations: 169

h-index: 3

Yue Yu

Citations: 296

h-index: 5

Yu-Cui Xiang

Citations: 92

h-index: 2

Zhaojun Huang

Citations: 157

h-index: 3

Zhenrui Lin

Citations: 288

h-index: 6

Zhipeng Xu

Citations: 70

h-index: 1

Zhiyang Chen

Citations: 78

h-index: 2

Zhonghua Deng

Citations: 92

h-index: 2

Zihan Zhang

Citations: 161

h-index: 3

Zihao Yue

Citations: 167

h-index: 4

Yihao Zhao

Citations: 703

h-index: 10

Chong Ma

Citations: 1,734

h-index: 20

본 연구에서는 총 3090억 개의 파라미터와 150억 개의 활성 파라미터를 가진 Mixture-of-Experts (MoE) 모델인 MiMo-V2-Flash를 소개합니다. MiMo-V2-Flash는 빠르고 강력한 추론 능력과 에이전트 기능을 갖도록 설계되었습니다. 이 모델은 Sliding Window Attention (SWA)과 전역 어텐션을 결합한 하이브리드 어텐션 아키텍처를 채택하며, 5:1의 비율로 128 토큰의 슬라이딩 윈도우를 사용합니다. MiMo-V2-Flash는 Multi-Token Prediction (MTP)을 사용하여 27조 개의 토큰으로 사전 학습되었으며, 원래 32k의 컨텍스트 길이를 사용하고 이후 256k로 확장되었습니다. MiMo-V2-Flash는 학습 후 계산 비용을 효율적으로 줄이기 위해 새로운 Multi-Teacher On-Policy Distillation (MOPD) 패러다임을 도입했습니다. 이 프레임워크에서, 도메인 전문성을 가진 가상의 교사 모델(예: 대규모 강화 학습을 통해 훈련된 모델)은 학생 모델에게 밀집되고 토큰 수준의 보상을 제공하여, 학생 모델이 교사 모델의 전문성을 완벽하게 습득할 수 있도록 합니다. MiMo-V2-Flash는 DeepSeek-V3.2 및 Kimi-K2와 같은 최상위 오픈 소스 모델과 경쟁력을 갖추고 있으며, 각각 총 파라미터 수의 1/2 및 1/3만을 사용합니다. 추론 과정에서, MTP를 스펙티브 디코딩을 위한 초안 모델로 활용하여 MiMo-V2-Flash는 최대 3.6배의 수용 길이 증가와 2.6배의 디코딩 속도 향상을 달성했습니다. MiMo-V2-Flash의 모델 가중치와 3계층 MTP 가중치를 공개하여, 오픈 연구 및 커뮤니티 협력을 장려하고자 합니다.

Original Abstract

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

71 Citations

8 Influential

10 Altmetric

137.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!