2602.10604v2
Feb 11, 2026
cs.CL
Step 3.5 Flash: 110억 개의 활성 파라미터를 갖춘 최첨단 수준의 지능
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Luck Ma
Luck Ma
Citations:
0
h-index:
0
Zheng Ge
Zheng Ge
Citations:
383
h-index:
10
R. Han
R. Han
Citations:
320
h-index:
8
Yin Zhao
Yin Zhao
Citations:
65
h-index:
3
Heng Wang
Heng Wang
Citations:
209
h-index:
5
Xin Li
Xin Li
Citations:
97
h-index:
5
Di Qi
Di Qi
Citations:
56
h-index:
3
Chen Hu
Chen Hu
Citations:
217
h-index:
6
Ang Li
Ang Li
Citations:
233
h-index:
5
Zhe Xie
Zhe Xie
Citations:
296
h-index:
10
Ailin Huang
Ailin Huang
Megvii Research
Citations:
302
h-index:
7
Bin Wang
Bin Wang
Citations:
272
h-index:
5
Bo Dong
Bo Dong
Citations:
155
h-index:
2
Bo Wang
Bo Wang
Citations:
107
h-index:
3
Boyu Chen
Boyu Chen
Citations:
104
h-index:
3
Brian Li
Brian Li
Citations:
296
h-index:
5
Buyun Ma
Buyun Ma
Citations:
221
h-index:
6
Chang Su
Chang Su
Citations:
1
h-index:
1
Chao Lou
Chao Lou
Citations:
8
h-index:
2
Chen Xu
Chen Xu
Citations:
212
h-index:
6
Da Shi
Da Shi
Citations:
306
h-index:
5
Dehua Ma
Dehua Ma
Citations:
23
h-index:
1
Enle Liu
Enle Liu
Citations:
301
h-index:
5
Gulin Yan
Gulin Yan
Citations:
301
h-index:
5
Hao Nie
Hao Nie
Citations:
75
h-index:
4
Haoran Lv
Haoran Lv
Citations:
157
h-index:
5
He Lv
He Lv
Citations:
26
h-index:
3
H. Shum
H. Shum
Citations:
8,048
h-index:
34
Huan Zhu
Huan Zhu
Citations:
4
h-index:
1
H. Guo
H. Guo
Citations:
4
h-index:
1
Jia Wang
Jia Wang
Citations:
1
h-index:
1
J. Wu
J. Wu
Citations:
304
h-index:
5
Jie Luo
Jie Luo
Citations:
5
h-index:
1
Jie Yang
Jie Yang
Citations:
109
h-index:
6
Jie Zhou
Jie Zhou
Citations:
3
h-index:
1
Jing Bai
Jing Bai
Citations:
107
h-index:
2
Jin Xie
Jin Xie
Citations:
216
h-index:
4
Kaibo Liu
Kaibo Liu
Citations:
102
h-index:
4
Kang An
Kang An
Citations:
193
h-index:
5
Lei Yang
Lei Yang
Citations:
4
h-index:
1
Liang Lv
Liang Lv
Citations:
0
h-index:
0
Liguo Tan
Liguo Tan
Citations:
137
h-index:
2
Lin Lin
Lin Lin
Citations:
79
h-index:
5
Ming Li
Ming Li
Citations:
58
h-index:
4
Na Wang
Na Wang
Citations:
309
h-index:
5
Peng Liu
Peng Liu
Citations:
153
h-index:
4
Qi He
Qi He
Citations:
256
h-index:
2
Q. Du
Q. Du
Citations:
5
h-index:
1
Quan Sun
Quan Sun
Citations:
0
h-index:
0
Rong Yang
Rong Yang
Citations:
1,387
h-index:
12
Ruosi Wan
Ruosi Wan
Citations:
281
h-index:
10
S. Fan
S. Fan
Citations:
120
h-index:
5
S. Yang
S. Yang
Citations:
69
h-index:
3
Siqi Liu
Siqi Liu
Citations:
308
h-index:
6
Siye Wu
Siye Wu
Citations:
259
h-index:
3
Siyu Chen
Siyu Chen
Citations:
108
h-index:
4
Wang You
Wang You
Citations:
194
h-index:
5
Wei Ji
Wei Ji
Citations:
5
h-index:
1
Wei Yuan
Wei Yuan
Citations:
39
h-index:
4
Weibo Wu
Weibo Wu
Citations:
18
h-index:
2
W. Zheng
W. Zheng
Citations:
3
h-index:
1
Wuxun Xie
Wuxun Xie
Citations:
177
h-index:
4
Xin Wu
Xin Wu
Citations:
14
h-index:
3
Xing Chen
Xing Chen
Citations:
185
h-index:
4
Xuan He
Xuan He
Citations:
63
h-index:
4
Xu Feng
Xu Feng
Citations:
11
h-index:
1
Yanbo Yu
Yanbo Yu
Citations:
308
h-index:
6
Yang Li
Yang Li
Citations:
54
h-index:
4
Yang Xu
Yang Xu
Citations:
40
h-index:
2
Yibo Zhu
Yibo Zhu
Citations:
416
h-index:
7
Yu Zhou
Yu Zhou
Citations:
345
h-index:
6
Yuang Peng
Yuang Peng
Tsinghua University
Citations:
1,086
h-index:
11
Yue Peng
Yue Peng
Citations:
93
h-index:
6
Zexi Li
Zexi Li
Citations:
21
h-index:
2
Ziqi Ren
Ziqi Ren
Citations:
23
h-index:
3
Xin Liu
Xin Liu
Citations:
97
h-index:
3
Y. Guan
Y. Guan
Citations:
14
h-index:
3
Jie Hou
Jie Hou
Citations:
187
h-index:
9
본 논문에서는 Step 3.5 Flash를 소개합니다. 이는 최첨단 수준의 에이전트 지능과 계산 효율성을 결합한 희소한 Mixture-of-Experts (MoE) 모델입니다. 에이전트 구축 시 가장 중요한 요소인 정확한 추론 능력과 빠르고 안정적인 실행 능력을 중심으로 설계되었습니다. Step 3.5 Flash는 1960억 개의 파라미터로 구성된 기반 모델과 110억 개의 활성 파라미터를 사용하여 효율적인 추론을 가능하게 합니다. 또한, 다중 라운드 에이전트 상호 작용의 지연 시간과 비용을 줄이기 위해 3:1의 인터리브된 슬라이딩 윈도우/전체 어텐션 및 Multi-Token Prediction (MTP-3) 기술이 적용되었습니다. 최첨단 수준의 지능을 달성하기 위해, 검증 가능한 신호와 선호도 피드백을 결합한 확장 가능한 강화 학습 프레임워크를 설계했습니다. 이 프레임워크는 대규모 오프라인 학습 환경에서도 안정성을 유지하며, 수학, 코딩 및 도구 사용 능력 전반에 걸쳐 일관된 자기 개선을 가능하게 합니다. Step 3.5 Flash는 에이전트, 코딩 및 수학 관련 작업에서 뛰어난 성능을 보여주며, IMO-AnswerBench에서 85.4%, LiveCodeBench-v6 (2024.08-2025.05)에서 86.4%, tau2-Bench에서 88.2%, BrowseComp (컨텍스트 관리 포함)에서 69.0%, Terminal-Bench 2.0에서 51.0%의 정확도를 달성했습니다. 이는 GPT-5.2 xHigh 및 Gemini 3.0 Pro와 같은 최첨단 모델과 비교 가능한 수준입니다. Step 3.5 Flash는 효율성의 한계를 재정의함으로써, 실제 산업 환경에서 정교한 에이전트를 배포하기 위한 고밀도의 기반 모델을 제공합니다.
Original
Abstract
We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.