2309.16609 Sep 28, 2023 cs.AI

Qwen 기술 보고서

Qwen Technical Report

Yujia Liu

Citations: 0

h-index: 0

Jinze Bai

Citations: 15,026

h-index: 13

Shuai Bai

Citations: 21,355

h-index: 20

Yunfei Chu

Citations: 3,937

h-index: 2

Zeyu Cui

Citations: 17,708

h-index: 11

K. Dang

Citations: 28,766

h-index: 18

Xiaodong Deng

Citations: 6,192

h-index: 5

Yang Fan

Citations: 14,812

h-index: 7

Wenhang Ge

Citations: 4,306

h-index: 10

Fei Huang

Qwen Team, Alibaba Group

Citations: 18,316

h-index: 16

Binyuan Hui

Alibaba DAMO Academic

Citations: 8,809

h-index: 23

Luo Ji

Alibaba Group

Citations: 4,350

h-index: 8

Mei Li

Citations: 17,698

h-index: 8

Junyang Lin

Citations: 17,556

h-index: 40

Runji Lin

Citations: 12,270

h-index: 11

Dayiheng Liu

Alibaba Group

Citations: 5,173

h-index: 18

Gao Liu

Citations: 3,923

h-index: 2

Chengqiang Lu

Citations: 5,089

h-index: 11

K. Lu

Citations: 4,329

h-index: 11

Jianxin Ma

Citations: 9,981

h-index: 24

Rui Men

Citations: 26,724

h-index: 25

Xingzhang Ren

Citations: 4,058

h-index: 4

Xuancheng Ren

Citations: 6,527

h-index: 27

Chuanqi Tan

Citations: 11,374

h-index: 41

Sinan Tan

Citations: 14,775

h-index: 11

Jianhong Tu

Citations: 6,407

h-index: 8

Peng Wang

Citations: 9,733

h-index: 13

Shijie Wang

Alibaba DAMO Academy

Citations: 8,131

h-index: 6

Wei Wang

Citations: 4,931

h-index: 13

Shengguang Wu

Citations: 3,997

h-index: 4

Benfeng Xu

Citations: 3,977

h-index: 2

Jin Xu

Citations: 3,958

h-index: 2

An Yang

Peking University

Citations: 5,287

h-index: 19

Hao Yang

Citations: 3,952

h-index: 3

Jian Yang

Beihang University, Beijing, China

Citations: 4,722

h-index: 19

Jian Yang

Citations: 16,175

h-index: 5

Shusheng Yang

Citations: 7,428

h-index: 9

Yang Yao

Citations: 6,142

h-index: 2

Bowen Yu

Citations: 8,943

h-index: 14

Yu Bowen

Institute of Information Engineering, Chinese Academy of Sciences

Citations: 7,449

h-index: 26

Hongyi Yuan

Citations: 6,219

h-index: 19

Zheng Yuan

Alibaba DAMO Academy, Tsinghua University

Citations: 6,968

h-index: 22

Jianwei Zhang

Citations: 5,857

h-index: 12

Xing Zhang

Citations: 3,994

h-index: 4

Yichang Zhang

Citations: 8,461

h-index: 19

Zhenru Zhang

Citations: 18,372

h-index: 10

Chang Zhou

Citations: 13,929

h-index: 13

Jingren Zhou

Citations: 26,959

h-index: 29

Xiaohuan Zhou

Citations: 8,768

h-index: 10

Tianhang Zhu

Citations: 6,200

h-index: 4

대규모 언어 모델(LLM)은 인공지능 분야에 혁명을 일으켰으며, 이전에는 인간의 전유물로 여겨졌던 자연어 처리 작업을 가능하게 했습니다. 본 논문에서는 자사의 대규모 언어 모델 시리즈의 첫 번째 작품인 Qwen을 소개합니다. Qwen은 다양한 매개변수 수를 가진 서로 다른 모델들을 아우르는 포괄적인 언어 모델 시리즈입니다. 여기에는 사전 학습된 기본 언어 모델인 Qwen과 인간 정렬(human alignment) 기법으로 미세 조정된 대화형 모델인 Qwen-Chat이 포함됩니다. 기본 언어 모델은 다수의 다운스트림 작업 전반에 걸쳐 일관되게 우수한 성능을 보여주며, 특히 인간 피드백 기반 강화 학습(RLHF)을 사용하여 훈련된 대화형 모델은 매우 강력한 경쟁력을 갖추고 있습니다. 대화형 모델은 에이전트 애플리케이션 제작을 위한 고급 도구 사용 및 계획 수립 능력을 보유하고 있으며, 코드 인터프리터 활용과 같은 복잡한 작업에서 더 큰 모델과 비교해서도 인상적인 성능을 보여줍니다. 또한, 기본 언어 모델을 기반으로 구축된 코딩 특화 모델인 Code-Qwen 및 Code-Qwen-Chat과 수학 중심 모델인 Math-Qwen-Chat을 개발했습니다. 이 모델들은 오픈 소스 모델과 비교하여 상당히 향상된 성능을 보여주며, 독점 모델에 비해서는 약간 뒤처지는 수준입니다.

Original Abstract

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.

3937 Citations

424 Influential

20.5 Altmetric

4,887.5 Score

Original PDF

AI Analysis

Korean Summary

이 문서는 알리바바 그룹이 개발한 대규모 언어 모델(LLM) 시리즈인 'Qwen'에 대한 기술 보고서입니다. Qwen 시리즈는 1.8B, 7B, 14B 파라미터 모델로 구성되며, 최대 3조 토큰의 다국어 데이터로 사전 학습되었습니다. 이 모델은 LLaMA를 기반으로 수정된 트랜스포머 아키텍처를 채택하였으며, SFT(지도 미세 조정)와 RLHF(인간 피드백 강화 학습)를 통해 인간의 선호도에 맞춰 정렬(Alignment)되었습니다. 보고서는 코딩(Code-Qwen)과 수학(Math-Qwen)에 특화된 모델 개발 과정과 성능 평가 결과를 포함하며, 벤치마크 테스트에서 Qwen이 동급의 오픈소스 모델(LLaMA 2 등)을 능가하고 일부 상용 모델(GPT-3.5)에 근접하는 성능을 보임을 입증했습니다. 또한 도구 사용(Tool Use) 및 에이전트로서의 기능도 강조됩니다.

Key Innovations

수정된 트랜스포머 아키텍처: 임베딩과 출력 프로젝션의 가중치 비공유(Untied Embedding), FP32 정밀도의 RoPE, 편향(Bias) 제거(QKV 제외), SwiGLU 활성화 함수 적용
추론 시점의 컨텍스트 확장 기술: 추가 학습 없이 긴 문맥을 처리하기 위한 NTK-aware 보간법, LogN-Scaling, 윈도우 어텐션(Window Attention) 적용
고도화된 정렬(Alignment) 파이프라인: 수동 주석 데이터 기반의 SFT와 PMP(선호도 모델 사전 학습)를 거친 보상 모델을 활용한 RLHF 적용
특화 모델 개발: 코드 데이터와 수학 문제 해결에 최적화된 Code-Qwen 및 Math-Qwen 파생 모델 구축
향상된 토크나이저: 다국어 효율성을 높이기 위해 152K 크기의 어휘 사전을 구축하여 다양한 언어에서 높은 압축률 달성

Learning & Inference Impact

학습 단계에서는 Flash Attention을 활용하여 메모리 사용량을 줄이고 학습 속도를 최적화했습니다. 아키텍처적으로 입력 임베딩과 출력 프로젝션의 가중치를 분리(Untied)함으로써 메모리 비용은 증가했으나 모델 성능을 향상시켰습니다. 추론 단계에서 가장 큰 특징은 학습된 길이(2048 토큰)보다 훨씬 긴 컨텍스트(8192 토큰 이상)를 처리할 수 있도록 하는 NTK-aware 보간법과 LogN-Scaling 기술의 도입입니다. 이를 통해 재학습 없이 추론 시점에만 기술을 적용하여 긴 문서 처리 능력을 확보했습니다. 또한 다국어에 최적화된 토크나이저를 통해 추론 시 토큰 생성 비용을 절감하고 정보 전달 효율을 높였습니다.

Technical Difficulty

중급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!