2505.09388 May 14, 2025 cs.AI

Qwen3 기술 보고서

Qwen3 Technical Report

Yujia Liu

Citations: 0

h-index: 0

Zeyu Cui

Citations: 17,708

h-index: 11

K. Dang

Citations: 28,766

h-index: 18

Yang Fan

Citations: 14,812

h-index: 7

Fei Huang

Qwen Team, Alibaba Group

Citations: 18,316

h-index: 16

Mei Li

Citations: 17,698

h-index: 8

Rui Men

Citations: 26,724

h-index: 25

Jian Yang

Citations: 16,175

h-index: 5

Zhenru Zhang

Citations: 18,372

h-index: 10

An Yang

Citations: 13,945

h-index: 14

Anfeng Li

Citations: 5,855

h-index: 3

Baosong Yang

Citations: 13,430

h-index: 11

Beichen Zhang

Citations: 11,749

h-index: 7

Binyuan Hui

Citations: 14,005

h-index: 13

Bo Zheng

Citations: 12,698

h-index: 9

Bowen Yu

Citations: 14,775

h-index: 19

Chang Gao

Citations: 8,233

h-index: 7

Chengen Huang

Citations: 5,846

h-index: 2

Chenxu Lv

Citations: 7,098

h-index: 3

Chujie Zheng

Tsinghua University

Citations: 10,258

h-index: 25

Dayiheng Liu

Citations: 24,164

h-index: 24

Fan Zhou

Citations: 6,205

h-index: 4

Feng Hu

Citations: 5,959

h-index: 5

Hao Ge

Citations: 5,843

h-index: 1

Haoran Wei

Citations: 12,395

h-index: 6

Huan Lin

Citations: 13,578

h-index: 6

Jialong Tang

Citations: 8,679

h-index: 7

Jianwei Zhang

Citations: 10,206

h-index: 4

Jianxin Yang

Citations: 12,207

h-index: 7

Jiaxin Yang

Citations: 10,274

h-index: 9

Jingren Zhou

Citations: 15,205

h-index: 27

Junyan Lin

Citations: 5,847

h-index: 2

Keqin Bao

Citations: 10,092

h-index: 6

Ke‐Pei Yang

Citations: 5,850

h-index: 2

Le Yu

Citations: 5,877

h-index: 3

Li-Chun Deng

Citations: 5,850

h-index: 2

Min Xue

Citations: 8,094

h-index: 3

Mingze Li

Citations: 5,856

h-index: 3

Pei Zhang

Citations: 12,327

h-index: 5

Peng Wang

Citations: 5,855

h-index: 2

Qin Zhu

Citations: 10,598

h-index: 8

Ruize Gao

Citations: 8,231

h-index: 5

Shixuan Liu

Citations: 8,175

h-index: 6

Shuang Luo

Citations: 5,871

h-index: 4

Tianhao Li

Citations: 12,329

h-index: 5

Tianyi Tang

Citations: 5,972

h-index: 5

Wenbiao Yin

Citations: 6,857

h-index: 14

Xingzhang Ren

Citations: 15,052

h-index: 9

Xinyu Wang

Citations: 5,860

h-index: 3

Xinyu Zhang

Citations: 8,104

h-index: 3

Xuancheng Ren

Citations: 19,361

h-index: 9

Yang Su

Citations: 5,844

h-index: 1

Yi-Chao Zhang

Citations: 10,039

h-index: 3

Yinger Zhang

Citations: 5,860

h-index: 2

Yu Wan

Citations: 5,856

h-index: 3

Yuqiong Liu

Citations: 6,884

h-index: 4

Zekun Wang

Citations: 11,059

h-index: 12

Zhipeng Zhou

Citations: 5,877

h-index: 3

Zihan Qiu

Citations: 10,315

h-index: 6

본 연구에서는 Qwen 모델 제품군의 최신 버전인 Qwen3를 소개합니다. Qwen3는 성능, 효율성 및 다국어 기능을 향상시키기 위해 설계된 일련의 대형 언어 모델(LLM)로 구성됩니다. Qwen3 시리즈에는 0.6에서 2,350억 사이의 매개변수 규모를 가진 밀집(dense) 아키텍처와 전문가 혼합(MoE) 아키텍처 모델이 모두 포함됩니다. Qwen3의 핵심 혁신은 사고 모드(복잡한 다단계 추론용)와 비사고 모드(빠른 문맥 기반 응답용)를 통합된 프레임워크에 결합한 것입니다. 이는 채팅 최적화 모델(예: GPT-4o)과 전용 추론 모델(예: QwQ-32B)과 같이 서로 다른 모델 간에 전환할 필요를 없애고, 사용자 쿼리나 채팅 템플릿에 따라 동적 모드 전환을 가능하게 합니다. 한편, Qwen3는 사고 예산(thinking budget) 메커니즘을 도입하여 사용자가 추론 중에 계산 리소스를 적응적으로 할당할 수 있게 함으로써, 작업 복잡성에 따라 지연 시간과 성능의 균형을 맞춥니다. 또한, 플래그십 모델의 지식을 활용하여 소규모 모델 구축에 필요한 계산 리소스를 크게 줄이면서도 매우 경쟁력 있는 성능을 보장합니다. 실증적 평가 결과, Qwen3는 코드 생성, 수학적 추론, 에이전트 작업 등 다양한 벤치마크에서 최첨단(SOTA) 결과를 달성했으며, 더 큰 MoE 모델 및 독점 모델과 견줄 만한 성능을 보여줍니다. 이전 버전인 Qwen2.5와 비교하여 Qwen3는 다국어 지원을 29개에서 119개 언어 및 방언으로 확장하였으며, 향상된 교차 언어 이해 및 생성 능력을 통해 글로벌 접근성을 강화했습니다. 재현성과 커뮤니티 주도의 연구 개발을 촉진하기 위해, 모든 Qwen3 모델은 Apache 2.0 라이선스 하에 공개적으로 액세스할 수 있습니다.

Original Abstract

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.

5945 Citations

849 Influential

13.5 Altmetric

7,710.5 Score

Original PDF

AI Analysis

Korean Summary

Qwen3는 Qwen 모델 제품군의 최신 시리즈로, 0.6B에서 235B 파라미터 규모의 Dense 및 MoE(Mixture-of-Experts) 모델들을 포함합니다. 이 시리즈의 핵심은 복잡한 추론을 위한 'Thinking Mode'와 빠른 응답을 위한 'Non-thinking Mode'를 단일 프레임워크에 통합하여, 별도의 모델 전환 없이 사용자의 요구에 맞춰 작동한다는 점입니다. 또한 사용자가 추론에 투입할 연산 자원을 조절할 수 있는 'Thinking Budget' 메커니즘을 도입했습니다. 36조 토큰의 방대한 데이터로 사전 학습되었으며, 대형 모델의 지식을 소형 모델로 효율적으로 전이하는 'Strong-to-Weak Distillation' 기법을 통해 훈련 효율성을 높였습니다. 그 결과 코딩, 수학, 다국어 작업 등 다양한 벤치마크에서 기존 오픈 소스 및 일부 독점 모델을 능가하는 SOTA(State-of-the-Art) 성능을 달성했습니다.

Key Innovations

Thinking Mode(심층 추론)와 Non-thinking Mode(빠른 응답)의 단일 모델 통합
추론 연산량을 사용자가 제어할 수 있는 Thinking Budget(사고 예산) 메커니즘
대형 모델의 성능을 소형 모델로 효율적으로 이전하는 Strong-to-Weak Distillation
추론 능력과 일반 대화 능력을 결합하는 4단계 포스트 트레이닝 파이프라인
학습 안정성을 강화한 아키텍처 개선(QK-Norm 도입 및 QKV-bias 제거)

Learning & Inference Impact

학습 측면에서는 Strong-to-Weak Distillation을 통해 소형 모델이 대형 모델의 추론 능력을 1/10 수준의 GPU 시간으로 효율적으로 학습할 수 있게 되었으며, 강화 학습(RL)보다 더 높은 성능 향상을 이끌어냈습니다. 추론 측면에서는 사용자가 'Thinking Budget'을 설정함으로써 태스크의 복잡도에 따라 지연 시간(Latency)과 성능의 균형을 동적으로 조절할 수 있게 되었습니다. 이는 추론 전용 모델(예: o1)과 채팅 모델(예: GPT-4o)을 별도로 배포할 필요성을 없애 시스템 복잡도와 운영 비용을 크게 절감시킵니다.

Technical Difficulty

중급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!