2403.08295 Mar 13, 2024 cs.AI

Gemma: Gemini 연구 및 기술을 기반으로 한 오픈 모델

Gemma: Open Models Based on Gemini Research and Technology

O. Vinyals

Citations: 260,727

h-index: 103

Armand Joulin

Citations: 93,772

h-index: 71

R. Comanescu

Citations: 10,080

h-index: 12

D. Hassabis

Citations: 188,933

h-index: 91

K. Kavukcuoglu

Citations: 230,975

h-index: 76

Sebastian Borgeaud

Citations: 28,861

h-index: 20

Katie Millican

Citations: 21,670

h-index: 10

T. Hennigan

Citations: 14,219

h-index: 9

Elena Buchatskaya

Citations: 24,308

h-index: 12

L. Sifre

Citations: 58,228

h-index: 28

Jean-Baptiste Lespiau

Citations: 12,712

h-index: 18

J. Stanway

Citations: 10,768

h-index: 10

Machel Reid

Google DeepMind

Citations: 16,751

h-index: 19

Rohan Anil

Google Brain

Citations: 16,398

h-index: 21

Ross Mcilroy

Citations: 7,207

h-index: 7

Eli Collins

Citations: 10,372

h-index: 8

H. Michalewski

Citations: 27,239

h-index: 25

James Keeling

Citations: 10,470

h-index: 8

Siamak Shakeri

Citations: 12,736

h-index: 20

A. Chowdhery

Citations: 29,853

h-index: 30

Os-car Chang

Citations: 7,488

h-index: 8

George Tucker

Citations: 7,191

h-index: 6

Ambrose Slone

Citations: 13,102

h-index: 9

J Christopher Love

Citations: 7,327

h-index: 7

R. Chaabouni

Citations: 7,975

h-index: 16

J. Mao-Jones

Citations: 7,300

h-index: 5

Charline Le Lan

Citations: 11,067

h-index: 16

Paul Michel

Citations: 9,387

h-index: 9

Justin Chiu

Citations: 7,235

h-index: 7

Lisa Lee

Citations: 7,161

h-index: 4

Evan Senter

Citations: 10,429

h-index: 9

Mateo Wirth

Citations: 9,311

h-index: 8

Zafarali Ahmed

Citations: 7,189

h-index: 6

Eric Noland

Citations: 15,194

h-index: 11

Jenny Brennan

Citations: 9,940

h-index: 8

Sholto Douglas

Citations: 7,826

h-index: 8

Wojciech Stokowiec

Citations: 7,603

h-index: 9

Jane Labanowski

Citations: 7,167

h-index: 4

Minh Giang

Citations: 10,409

h-index: 8

Vladimir Feinberg

Citations: 10,645

h-index: 11

Jeremy Chen

Citations: 7,167

h-index: 4

Ruibo Liu

Google DeepMind

Citations: 6,464

h-index: 22

Michael Sharman

Citations: 5,051

h-index: 8

Alek Andreev

Citations: 10,591

h-index: 8

David Reid

Citations: 7,183

h-index: 5

Gemma Team Thomas Mesnard

Citations: 1,001

h-index: 1

Cassidy Hardin

Citations: 7,193

h-index: 9

Robert Dadashi

Citations: 8,257

h-index: 20

Surya Bhupatiraju

Citations: 7,543

h-index: 11

Shreya Pathak

Citations: 7,221

h-index: 9

Morgane Rivière

Citations: 5,275

h-index: 6

Mihir Kale

Citations: 8,470

h-index: 18

P. Tafti

Citations: 7,502

h-index: 15

L'eonard Hussenot

Citations: 2,391

h-index: 16

Adam Roberts

Citations: 1,017

h-index: 3

Aditya Barua

Citations: 5,000

h-index: 7

Alex Botev

Citations: 1,031

h-index: 3

Alex Castro-Ros

Citations: 3,713

h-index: 4

Am'elie H'eliou

Citations: 1,311

h-index: 9

Andrea Tacchetti

Citations: 5,444

h-index: 17

Anna Bulanova

Citations: 7,212

h-index: 7

Antonia Paterson

Citations: 4,255

h-index: 6

Beth Tsai

Citations: 1,025

h-index: 3

Bobak Shahriari

Citations: 13,327

h-index: 15

Christopher A. Choquette-Choo

Google DeepMind

Citations: 10,995

h-index: 25

Clé-ment Crepy

Citations: 1,350

h-index: 5

Daniel Cer

Citations: 1,294

h-index: 7

Daphne Ippolito

Citations: 18,577

h-index: 32

Eric Ni

Citations: 3,702

h-index: 4

Geng Yan

Citations: 3,721

h-index: 5

George-Christian Muraru

Citations: 1,023

h-index: 3

Grigory Rozhdestvenskiy

Citations: 3,692

h-index: 3

Ian Tenney

Citations: 4,617

h-index: 9

Ivan Grishchenko

Citations: 2,320

h-index: 7

Jacob Austin

Citations: 16,955

h-index: 10

Johan Ferret

Citations: 8,368

h-index: 18

Kather-ine Lee

Citations: 4,597

h-index: 8

Kathy Yu

Citations: 2,261

h-index: 3

Lars Lowe Sjoesund

Citations: 5,641

h-index: 5

Lucas Dixon

Google DeepMind

Citations: 9,106

h-index: 19

Maciej Mikuła

Citations: 3,692

h-index: 3

Nikolai Chinaev

Citations: 3,696

h-index: 4

Nithum Thain

Citations: 4,891

h-index: 20

Olivier Bachem

Citations: 14,127

h-index: 41

Oscar Wahltinez

Citations: 3,154

h-index: 7

Paige Bailey

Citations: 2,620

h-index: 5

Petko Yotov

Citations: 1,012

h-index: 2

Pier Giuseppe Sessa

Citations: 7,415

h-index: 16

Reena Jana

Citations: 2,964

h-index: 4

Ryan Mullins

Citations: 7,160

h-index: 10

Samuel L. Smith

Citations: 1,276

h-index: 4

Sertan Girgin

Citations: 9,490

h-index: 23

Shree Pandya

Citations: 1,024

h-index: 3

Soham De

Citations: 1,475

h-index: 9

Ted Klimenko

Citations: 1,036

h-index: 4

Zhitao Gong

Citations: 3,285

h-index: 7

Tris Warkentin

Citations: 7,979

h-index: 12

Ludovic Peran

Citations: 3,172

h-index: 7

Clément Farabet

Citations: 10,800

h-index: 9

Jeffrey Dean

Citations: 15,482

h-index: 7

Z. Ghahramani

Citations: 9,485

h-index: 21

Douglas Eck

Citations: 1,019

h-index: 3

Joelle Barral

Citations: 4,223

h-index: 6

Fernando Pereira

Citations: 3,690

h-index: 3

Noah Fiedel

Citations: 24,079

h-index: 20

Kathleen Kenealy

Citations: 6,306

h-index: 9

Yu-Hui Chen

Citations: 2,307

h-index: 10

본 연구에서는 Gemini 모델을 만드는 데 사용된 연구 및 기술을 바탕으로 구축된 경량의 최첨단 오픈 모델 제품군인 Gemma를 소개합니다. Gemma 모델은 언어 이해, 추론 및 안전성에 대한 학술 벤치마크 전반에서 강력한 성능을 보여줍니다. 우리는 두 가지 크기(20억 및 70억 매개변수)의 모델을 공개하며, 사전 학습된 체크포인트와 미세 조정된 체크포인트를 모두 제공합니다. Gemma는 18개의 텍스트 기반 작업 중 11개에서 유사한 크기의 오픈 모델보다 뛰어난 성능을 보이며, 모델 개발에 대한 상세한 설명과 함께 모델의 안전성 및 책임성 측면에 대한 포괄적인 평가를 제시합니다. 우리는 책임감 있는 LLM 공개가 프런티어 모델의 안전성을 개선하고 차세대 LLM 혁신을 가능하게 하는 데 중요하다고 믿습니다.

Original Abstract

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

1011 Citations

127 Influential

30 Altmetric

1,415.0 Score

Original PDF

AI Analysis

Korean Summary

본 논문은 구글 딥마인드(Google DeepMind)가 개발한 '젬마(Gemma)' 모델을 소개합니다. 젬마는 구글의 최신 모델인 제미나이(Gemini)의 연구 및 기술을 기반으로 구축된 경량 오픈 모델 제품군입니다. 20억(2B) 및 70억(7B) 파라미터 두 가지 크기로 제공되며, 사전 학습(Pre-trained) 및 지시 튜닝(Instruction-tuned)된 체크포인트가 모두 공개되었습니다. 젬마는 대규모 텍스트 데이터(최대 6조 토큰)로 학습되었으며, 유사한 크기의 오픈 모델인 LLaMA-2나 Mistral에 비해 언어 이해, 추론, 코딩, 수학 등 18개 중 11개 벤치마크에서 우수한 성능을 입증했습니다. 또한, 엄격한 안전성 평가와 RLHF(인간 피드백 기반 강화 학습)를 통해 책임감 있는 AI 배포를 강조하고 있습니다.

Key Innovations

제미나이(Gemini) 모델 기반의 아키텍처 및 학습 레시피 계승
2B 모델에 멀티 쿼리 어텐션(Multi-Query Attention, MQA)을 적용하여 온디바이스 효율성 증대 (7B는 멀티 헤드 어텐션 유지)
절대 위치 임베딩 대신 회전 위치 임베딩(RoPE) 사용 및 GeGLU 활성화 함수 도입
256k 토큰의 대용량 어휘 사전(Vocabulary)과 8192 토큰의 컨텍스트 길이 지원
SFT(지도 미세 조정)와 RLHF(인간 피드백 기반 강화 학습)를 결합한 고도화된 튜닝 파이프라인
TPUv5e 및 JAX/Pathways 인프라를 활용한 대규모 분산 학습 및 최적화

Learning & Inference Impact

학습 과정에서는 TPUv5e와 JAX/Pathways를 활용하여 수천 개의 칩에 걸쳐 모델을 효율적으로 샤딩하고 데이터를 복제하여 학습 속도와 안정성을 확보했습니다. 특히 젬마는 2B 모델 설계 시 멀티 쿼리 어텐션(MQA)을 채택하여 추론 시 KV 캐시 메모리 사용량을 줄이고 속도를 높여 CPU나 모바일 기기(온디바이스) 배포에 유리하도록 설계되었습니다. 반면 7B 모델은 성능 극대화를 위해 멀티 헤드 어텐션을 유지하여 GPU/TPU 환경에서의 고성능 배포에 초점을 맞췄습니다. 또한, 256k의 큰 어휘 사전은 다국어 처리 및 압축 효율에 기여하지만 모델 크기를 일부 증가시키는 요인이 되기도 합니다. 학습 데이터 필터링과 RLHF 단계는 모델의 환각을 줄이고 안전성을 높이는 데 결정적인 역할을 수행했습니다.

Technical Difficulty

중급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!