2403.05530
Mar 08, 2024
cs.AI
Gemini 1.5: 수백만 토큰의 컨텍스트에 걸친 멀티모달 이해 능력 실현
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Roman Ring
Roman Ring
Citations:
18,785
h-index:
9
Machel Reid
Machel Reid
Google DeepMind
Citations:
15,241
h-index:
19
N. Savinov
N. Savinov
Citations:
8,153
h-index:
16
Rohan Anil
Rohan Anil
Google Brain
Citations:
15,094
h-index:
21
M. Isard
M. Isard
Citations:
72,361
h-index:
56
P. Barham
P. Barham
Citations:
25,672
h-index:
12
Zhen Yang
Zhen Yang
Citations:
5,352
h-index:
6
Ankesh Anand
Ankesh Anand
Mila, University of Montreal
Citations:
6,972
h-index:
13
Karel Lenc
Karel Lenc
Citations:
16,342
h-index:
13
S. Haykal
S. Haykal
Citations:
5,813
h-index:
7
Ce Zheng
Ce Zheng
Citations:
5,311
h-index:
5
Xi Chen
Xi Chen
Citations:
6,340
h-index:
5
Ben Caine
Ben Caine
Citations:
5,604
h-index:
4
N. Houlsby
N. Houlsby
Citations:
86,972
h-index:
52
Libin Bai
Libin Bai
Citations:
5,293
h-index:
3
Luheng He
Luheng He
Citations:
8,141
h-index:
6
Yujia Li
Yujia Li
Citations:
5,303
h-index:
5
Kelvin Xu
Kelvin Xu
Citations:
5,622
h-index:
5
Biao Zhang
Biao Zhang
University of Edinburgh
Citations:
7,459
h-index:
24
Dawei Jia
Dawei Jia
Citations:
5,291
h-index:
4
Albert Webson
Albert Webson
Google DeepMind
Citations:
15,231
h-index:
15
A. Morris
A. Morris
Citations:
5,283
h-index:
3
Bibo Xu
Bibo Xu
Citations:
5,279
h-index:
2
T. Paine
T. Paine
Citations:
12,618
h-index:
25
Fangyu Liu
Fangyu Liu
Google DeepMind
Citations:
7,480
h-index:
31
Yunhan Xu
Yunhan Xu
Citations:
5,506
h-index:
4
Rhys May
Rhys May
Citations:
5,449
h-index:
5
Boxi Wu
Boxi Wu
Citations:
6,025
h-index:
4
M. Krikun
M. Krikun
Citations:
22,630
h-index:
23
A. Pritzel
A. Pritzel
Citations:
83,454
h-index:
29
Fan Yang
Fan Yang
Citations:
5,291
h-index:
4
Kevin Hui
Kevin Hui
Citations:
6,325
h-index:
3
C. Yeh
C. Yeh
Citations:
5,317
h-index:
5
P. Schuh
P. Schuh
Citations:
13,393
h-index:
9
D. Chung
D. Chung
Citations:
5,325
h-index:
5
Kris Cao
Kris Cao
Citations:
3,267
h-index:
3
G. Surita
G. Surita
Citations:
5,341
h-index:
5
Seb Noury
Seb Noury
Citations:
8,540
h-index:
8
Lisa Lee
Lisa Lee
Citations:
6,214
h-index:
4
S. Vikram
S. Vikram
Citations:
6,523
h-index:
16
J. Devlin
J. Devlin
Citations:
3,451
h-index:
7
James Qin
James Qin
Citations:
12,615
h-index:
17
M. Polacek
M. Polacek
Citations:
7,021
h-index:
11
S. Kashem
S. Kashem
Citations:
3,252
h-index:
2
Le Hou
Le Hou
Citations:
5,287
h-index:
2
Dong Li
Dong Li
Citations:
3,255
h-index:
3
Ada Ma
Ada Ma
Citations:
5,307
h-index:
5
Antoine Yang
Antoine Yang
Google DeepMind
Citations:
4,741
h-index:
11
Nir Levine
Nir Levine
Citations:
7,938
h-index:
14
Cheng Li
Cheng Li
Citations:
5,313
h-index:
4
A. Guez
A. Guez
Citations:
53,778
h-index:
27
Tianhe Yu
Tianhe Yu
Citations:
5,344
h-index:
4
Mina Khan
Mina Khan
Citations:
8,685
h-index:
5
Keran Rong
Keran Rong
Citations:
7,451
h-index:
10
V. Sharma
V. Sharma
Citations:
5,304
h-index:
4
Nicola De Cao
Nicola De Cao
The University of Edinburgh
Citations:
8,004
h-index:
20
Tim Blyth
Tim Blyth
Citations:
5,288
h-index:
4
A. Khodaei
A. Khodaei
Citations:
3,568
h-index:
11
S. Thakoor
S. Thakoor
Citations:
8,703
h-index:
14
Dian Yu
Dian Yu
Citations:
5,406
h-index:
5
Iain Barr
Iain Barr
Citations:
11,649
h-index:
7
Dan Horgan
Dan Horgan
Citations:
15,981
h-index:
11
Fei Xia
Fei Xia
Citations:
5,308
h-index:
4
Ruibo Liu
Ruibo Liu
Google DeepMind
Citations:
6,075
h-index:
22
M. Mauger
M. Mauger
Citations:
5,283
h-index:
3
Ivy Zheng
Ivy Zheng
Citations:
5,293
h-index:
3
Dan Hurt
Dan Hurt
Citations:
8,367
h-index:
4
Yao Zhao
Yao Zhao
Citations:
3,289
h-index:
2
S. Sarcar
S. Sarcar
Citations:
3,255
h-index:
2
J. Jia
J. Jia
Citations:
5,298
h-index:
4
Lucy Kim
Lucy Kim
Citations:
5,280
h-index:
2
Li Lao
Li Lao
Citations:
5,291
h-index:
3
Irene Cai
Irene Cai
Citations:
5,279
h-index:
2
D. Cesare
D. Cesare
Citations:
7,341
h-index:
16
Yelin Kim
Yelin Kim
Citations:
3,253
h-index:
2
B. Rosgen
B. Rosgen
Citations:
5,290
h-index:
3
Zora Tung
Zora Tung
Citations:
6,052
h-index:
4
Jin Huang
Jin Huang
Citations:
3,255
h-index:
2
Rui Zhu
Rui Zhu
Citations:
5,446
h-index:
5
W. Gierke
W. Gierke
Citations:
6,012
h-index:
8
S. Yeganeh
S. Yeganeh
Citations:
8,443
h-index:
16
Lu Li
Lu Li
Citations:
5,307
h-index:
4
Yaxin Liu
Yaxin Liu
Citations:
5,280
h-index:
2
Tomy Tsai
Tomy Tsai
Citations:
5,315
h-index:
4
C. Elkind
C. Elkind
Citations:
3,367
h-index:
4
A. Wang
A. Wang
Citations:
5,279
h-index:
2
Xinyu Ye
Xinyu Ye
Citations:
3,316
h-index:
5
Hoi Lam
Hoi Lam
Citations:
5,283
h-index:
3
M. Shukla
M. Shukla
Citations:
3,255
h-index:
2
본 보고서에서는 수백만 토큰의 컨텍스트(여러 개의 긴 문서, 수 시간 분량의 비디오 및 오디오 포함)에서 세밀한 정보를 회상하고 추론할 수 있는 차세대 고효율 멀티모달 모델인 Gemini 1.5 모델군을 소개합니다. 이 모델군에는 두 가지 새로운 모델이 포함됩니다. (1) 대다수의 기능과 벤치마크에서 2월 버전을 능가하는 업데이트된 Gemini 1.5 Pro, (2) 품질 저하를 최소화하면서 효율성을 위해 설계된 더 경량화된 변형인 Gemini 1.5 Flash입니다. Gemini 1.5 모델은 모든 모달리티의 긴 컨텍스트 검색 작업에서 거의 완벽한 회상(recall) 성능을 달성하고, 장문 문서 QA, 장시간 비디오 QA 및 긴 컨텍스트 ASR에서 최첨단 성능(SOTA)을 개선하며, 광범위한 벤치마크 세트에서 Gemini 1.0 Ultra의 최고 성능과 대등하거나 이를 능가합니다. Gemini 1.5의 긴 컨텍스트 능력의 한계를 연구한 결과, Claude 3.0(200k) 및 GPT-4 Turbo(128k)와 같은 기존 모델을 뛰어넘는 세대적 도약이라 할 수 있는 최소 1,000만 토큰까지의 다음 토큰 예측 성능의 지속적인 향상과 거의 완벽한 검색 성능(>99%)을 확인했습니다. 마지막으로, Gemini 1.5가 전문가와 협업하여 작업을 수행함으로써 10가지 다른 직무 범주에서 26~75%의 시간 절약을 달성하는 등의 실제 사용 사례와 프런티어 대규모 언어 모델(LLM)의 놀라운 새로운 기능을 강조합니다. 전 세계 화자가 200명 미만인 칼라망(Kalamang)어 문법 교재가 주어졌을 때, 모델은 동일한 콘텐츠로 학습한 사람과 유사한 수준으로 영어를 칼라망어로 번역하는 법을 학습합니다.
Original
Abstract
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.