2308.12950 Aug 24, 2023 cs.AI

Code Llama: 코드를 위한 개방형 파운데이션 모델

Code Llama: Open Foundation Models for Code

Hugo Touvron

Citations: 66,040

h-index: 17

Baptiste Rozière

Citations: 26,660

h-index: 21

Faisal Azhar

Citations: 23,801

h-index: 4

Cristian Canton Ferrer

Citations: 38,327

h-index: 15

Jade Copet

Citations: 24,013

h-index: 22

Joanna Bitton

Citations: 21,028

h-index: 12

Sten Sootla

Citations: 19,726

h-index: 12

Thomas Scialom

Citations: 39,271

h-index: 17

Aaron Grattafiori

Citations: 19,704

h-index: 11

Itai Gat

Meta

Citations: 22,081

h-index: 22

Nicolas Usunier

Citations: 55,738

h-index: 46

Tal Remez

Citations: 22,001

h-index: 23

Jonas Gehring

Citations: 7,847

h-index: 12

Fabian Gloeckle

Citations: 3,772

h-index: 7

Xiaoqing Tan

University of Pittsburgh

Citations: 3,612

h-index: 12

Yossi Adi

Citations: 27,063

h-index: 34

Jingyu Liu

Citations: 3,928

h-index: 10

J. Rapin

Citations: 4,270

h-index: 18

Artyom Kozhevnikov

Citations: 3,640

h-index: 4

I. Evtimov

Citations: 7,267

h-index: 14

Manish P Bhatt

Citations: 3,241

h-index: 2

Wenhan Xiong

Citations: 7,415

h-index: 24

Alexandre Défossez

Citations: 4,055

h-index: 6

Louis Martin

Facebook AI Research

Citations: 22,677

h-index: 11

Gabriel Synnaeve

Citations: 63,500

h-index: 58

우리는 Llama 2를 기반으로 하는 코드용 대규모 언어 모델 제품군인 Code Llama를 공개합니다. 이 모델은 개방형 모델 중 최고 수준의 성능, 인필링(infilling) 기능, 대규모 입력 컨텍스트 지원, 그리고 프로그래밍 작업에 대한 제로샷 지시 따르기(zero-shot instruction following) 능력을 제공합니다. 우리는 광범위한 응용 분야를 지원하기 위해 파운데이션 모델(Code Llama), 파이썬 특화 모델(Code Llama - Python), 지시 따르기 모델(Code Llama - Instruct) 등 다양한 버전을 제공하며, 각각 7B, 13B, 34B, 70B 파라미터 모델로 구성됩니다. 모든 모델은 16k 토큰 시퀀스로 학습되었으며, 최대 100k 토큰의 입력에서도 향상된 성능을 보여줍니다. 7B, 13B, 70B의 Code Llama 및 Code Llama - Instruct 모델은 주변 내용을 기반으로 한 인필링 기능을 지원합니다. Code Llama는 여러 코드 벤치마크에서 개방형 모델 중 최고 수준의 성능을 달성했으며, HumanEval과 MBPP에서 각각 최대 67%와 65%의 점수를 기록했습니다. 특히 Code Llama - Python 7B는 HumanEval과 MBPP에서 Llama 2 70B를 능가하며, 우리의 모든 모델은 MultiPL-E에서 공개된 다른 모든 모델보다 우수한 성능을 보입니다. 우리는 연구 및 상업적 사용이 모두 가능한 허용적인 라이선스로 Code Llama를 공개합니다.

Original Abstract

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B, 13B and 70B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 67% and 65% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use.

3248 Citations

376 Influential

29 Altmetric

4,145.0 Score

Original PDF

AI Analysis

Korean Summary

Meta AI에서 Llama 2를 기반으로 개발한 코드 생성 및 이해를 위한 오픈 소스 LLM 제품군인 'Code Llama'를 제안한다. 7B, 13B, 34B, 70B의 파라미터 크기로 제공되며, 기본 모델, Python 특화 모델, 지시어 이행(Instruct) 모델로 나뉜다. 이 모델들은 코드 중간 채우기(Infilling), 최대 100k 토큰의 긴 문맥 처리, 제로샷 지시어 이행 능력을 갖추고 있으며, HumanEval 및 MBPP 등의 벤치마크에서 기존 오픈 소스 모델들을 능가하는 최첨단 성능을 달성했다.

Key Innovations

Llama 2 가중치를 초기화로 사용한 단계별 학습 파이프라인 (Foundation -> Code -> Python -> Instruct)
RoPE(Rotary Positional Embeddings)의 주파수 기반(Base Period)을 조정하여 최대 100k 토큰까지 처리 가능한 '장문 문맥 미세 조정(Long Context Fine-Tuning, LCFT)'
코드의 중간 부분을 예측하도록 훈련하는 '인필링(Infilling)' 학습 목표 도입 (7B, 13B, 70B 모델)
실행 피드백과 단위 테스트를 활용하여 자체 생성한 데이터로 학습하는 Self-Instruct 방법론

Learning & Inference Impact

학습 과정에서 긴 시퀀스 처리를 위한 별도의 미세 조정 단계를 두어, 훈련 시에는 16k 토큰 시퀀스만 보더라도 추론 시에는 최대 100k 토큰까지 외삽(Extrapolation)할 수 있도록 설계하여 학습 비용 대비 문맥 처리 효율을 극대화했다. 이는 모델이 파일 단위를 넘어 리포지토리 전체 수준의 문맥을 이해하고 참조할 수 있게 한다. 또한 인필링 기능의 통합으로 단순 코딩 생성을 넘어 IDE에서의 코드 편집 및 자동 완성 보조 도구로서의 추론 활용성을 크게 높였다.

Technical Difficulty

중급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!