2602.11298v2 Feb 11, 2026 cs.AI

Voxtral Realtime: 실시간 자동 음성 인식 모델

Voxtral Realtime

Thibaut Lavril
Thibaut Lavril
Citations: 42,813
h-index: 18
Guillaume Lample
Guillaume Lample
Citations: 41,140
h-index: 27
Diego de Las Casas
Diego de Las Casas
Citations: 16,306
h-index: 12
Baptiste Rozière
Baptiste Rozière
Citations: 13,493
h-index: 9
Olivier Duchenne
Olivier Duchenne
Citations: 14,791
h-index: 12
Romain Sauvestre
Romain Sauvestre
Citations: 13,475
h-index: 8
Am'elie H'eliou
Am'elie H'eliou
Citations: 1,193
h-index: 8
Alexander H. Liu
Alexander H. Liu
Citations: 24
h-index: 4
Andy Ehrenberg
Andy Ehrenberg
Citations: 34
h-index: 3
Andy Lo
Andy Lo
Citations: 166
h-index: 4
Chenkai Sun
Chenkai Sun
Citations: 279
h-index: 6
Jean-Malo Delignon
Jean-Malo Delignon
Citations: 29
h-index: 3
K. Chandu
K. Chandu
Citations: 17
h-index: 2
Patrick von Platen
Patrick von Platen
Citations: 9,078
h-index: 15
Pavankumar Reddy Muddireddy
Pavankumar Reddy Muddireddy
Citations: 0
h-index: 0
R. Arora
R. Arora
Citations: 182
h-index: 5
Sanchit Gandhi
Sanchit Gandhi
Citations: 3,064
h-index: 7
Sandeep Subramanian
Sandeep Subramanian
Citations: 1,851
h-index: 5
Soham Ghosh
Soham Ghosh
Citations: 39
h-index: 4
Srijan Mishra
Srijan Mishra
Citations: 38
h-index: 4
Abhinav Rastogi
Abhinav Rastogi
Google Research
Citations: 6,126
h-index: 21
Alan Jeffares
Alan Jeffares
Citations: 170
h-index: 8
Albert Q. Jiang
Albert Q. Jiang
Citations: 2,380
h-index: 8
Alexandre Sablayrolles
Alexandre Sablayrolles
Citations: 5,262
h-index: 10
A. Bai
A. Bai
Citations: 3
h-index: 1
Angele Lenglemetz
Angele Lenglemetz
Citations: 0
h-index: 0
Anmol Agarwal
Anmol Agarwal
Citations: 29
h-index: 3
Anton Eliseev
Anton Eliseev
Citations: 9
h-index: 1
Antonia Calvi
Antonia Calvi
Citations: 27
h-index: 2
Arjun Majumdar
Arjun Majumdar
Citations: 51
h-index: 3
Baptiste Bout
Baptiste Bout
Citations: 41
h-index: 4
Baudouin De Monicault
Baudouin De Monicault
Citations: 38
h-index: 4
Benjamin Tibi
Benjamin Tibi
Citations: 0
h-index: 0
Clémence Lanfranchi
Clémence Lanfranchi
Citations: 46
h-index: 5
Connor Chen
Connor Chen
Citations: 99
h-index: 4
Corentin Barreau
Corentin Barreau
Citations: 24
h-index: 3
Corentin Sautier
Corentin Sautier
ENPC
Citations: 310
h-index: 5
Cyprien Courtot
Cyprien Courtot
Citations: 11
h-index: 2
Darius Dabert
Darius Dabert
Citations: 33
h-index: 3
Elliot Chane-Sane
Elliot Chane-Sane
Citations: 101
h-index: 6
Enguerrand Paquin
Enguerrand Paquin
Citations: 0
h-index: 0
Federico Baldassarre
Federico Baldassarre
Citations: 0
h-index: 0
Gabrielle Berrada
Gabrielle Berrada
Citations: 40
h-index: 4
Gaetan Ecrepont
Gaetan Ecrepont
Citations: 9
h-index: 1
Gauthier Guinet
Gauthier Guinet
Citations: 75
h-index: 5
G. Hayes
G. Hayes
Citations: 14
h-index: 1
Georgii Sergeevich Novikov
Georgii Sergeevich Novikov
Skolkovo Institute of Science and Technology
Citations: 118
h-index: 6
G. Pistilli
G. Pistilli
Citations: 1
h-index: 1
Guillaume Martin
Guillaume Martin
Citations: 17
h-index: 2
Gunjan Dhanuka
Gunjan Dhanuka
Citations: 4
h-index: 1
Gunshi Gupta
Gunshi Gupta
Citations: 234
h-index: 7
Indraneel Mukherjee
Indraneel Mukherjee
Citations: 302
h-index: 6
Irene Zhang
Irene Zhang
Citations: 71
h-index: 4
Jaeyoung Kim
Jaeyoung Kim
Citations: 47
h-index: 3
Jan Ludziejewski
Jan Ludziejewski
Citations: 277
h-index: 7
Jason Rute
Jason Rute
Citations: 71
h-index: 6
Joachim Studnia
Joachim Studnia
Citations: 173
h-index: 5
John Harvill
John Harvill
Citations: 2
h-index: 1
Jonas Amar
Jonas Amar
Citations: 17
h-index: 2
Julien Tauran
Julien Tauran
Citations: 0
h-index: 0
Karmesh Yadav
Karmesh Yadav
Citations: 17
h-index: 2
Kartik Khandelwal
Kartik Khandelwal
Citations: 49
h-index: 5
Kush Jain
Kush Jain
Citations: 38
h-index: 4
Laurence Aitchison
Laurence Aitchison
Citations: 121
h-index: 5
Léonard Blier
Léonard Blier
Citations: 314
h-index: 6
Lingxiao Zhao
Lingxiao Zhao
Citations: 261
h-index: 6
L. Martin
L. Martin
Citations: 150
h-index: 3
Lucile Saulnier
Lucile Saulnier
Citations: 8,649
h-index: 10
Luyu Gao
Luyu Gao
Citations: 38
h-index: 4
Maarten Buyl
Maarten Buyl
Citations: 309
h-index: 9
Manan Sharma
Manan Sharma
Citations: 6
h-index: 2
Margaret Jennings
Margaret Jennings
Citations: 9
h-index: 1
Marie Pellat
Marie Pellat
Citations: 10,683
h-index: 9
Mark Prins
Mark Prins
Citations: 9
h-index: 1
Mathieu Poir'ee
Mathieu Poir'ee
Citations: 9
h-index: 1
Mathilde Guillaumin
Mathilde Guillaumin
Citations: 38
h-index: 4
Matthieu Dinot
Matthieu Dinot
Citations: 57
h-index: 5
Matthieu Futeral
Matthieu Futeral
Citations: 119
h-index: 5
Maxime Darrin
Maxime Darrin
Citations: 90
h-index: 6
Maximilian Augustin
Maximilian Augustin
Citations: 17
h-index: 2
Mert Unsal
Mert Unsal
Citations: 124
h-index: 3
Mia Chiquier
Mia Chiquier
Citations: 57
h-index: 4
Nathan Grinsztajn
Nathan Grinsztajn
Citations: 497
h-index: 11
N. Gupta
N. Gupta
Citations: 3,007
h-index: 25
Olivier Bousquet
Olivier Bousquet
Citations: 9
h-index: 1
Patricia Wang
Patricia Wang
Citations: 72
h-index: 5
Paul Jacob
Paul Jacob
Citations: 170
h-index: 5
Paul Wambergue
Paul Wambergue
Citations: 29
h-index: 4
Paula Kurylowicz
Paula Kurylowicz
Citations: 64
h-index: 5
Philomène Chagniot
Philomène Chagniot
Citations: 43
h-index: 4
Pierre Stock
Pierre Stock
Citations: 5,071
h-index: 8
Piotr Milo's
Piotr Milo's
Citations: 20
h-index: 2
Pravesh Agrawal
Pravesh Agrawal
Citations: 164
h-index: 4
Quentin Torroba
Quentin Torroba
Citations: 9
h-index: 1
Ram Ramrakhya
Ram Ramrakhya
Citations: 163
h-index: 6
R. Shah
R. Shah
Citations: 14
h-index: 1
Roman Soletskyi
Roman Soletskyi
Citations: 417
h-index: 7
R. Millner
R. Millner
Citations: 26
h-index: 2
S. Vaze
S. Vaze
Citations: 1,473
h-index: 13
Samuel Humeau
Samuel Humeau
Citations: 2,039
h-index: 15
Siddharth Gandhi
Siddharth Gandhi
Citations: 75
h-index: 6
Sumukh Aithal
Sumukh Aithal
Citations: 44
h-index: 5
Szymon Antoniak
Szymon Antoniak
Citations: 2,049
h-index: 7
Teven Le Scao
Teven Le Scao
Citations: 16,609
h-index: 21
Théo Cachet
Théo Cachet
Citations: 33
h-index: 3
Theo Simon Sorg
Theo Simon Sorg
Citations: 9
h-index: 1
Thomas Chabal
Thomas Chabal
Citations: 47
h-index: 3
Thomas Foubert
Thomas Foubert
Citations: 63
h-index: 4
Thomas Robert
Thomas Robert
Citations: 17
h-index: 2
Thomas Wang
Thomas Wang
Citations: 1,868
h-index: 6
Tim Lawson
Tim Lawson
University of Bristol
Citations: 55
h-index: 4
Tom Bewley
Tom Bewley
Citations: 50
h-index: 6
Tom Edwards
Tom Edwards
Citations: 9
h-index: 1
T. Wang
T. Wang
Citations: 0
h-index: 0
Valeriia Nemychnikova
Valeriia Nemychnikova
Citations: 39
h-index: 4
Van Phung
Van Phung
Citations: 230
h-index: 2
Vedant Nanda
Vedant Nanda
Citations: 377
h-index: 8
Victor Jouault
Victor Jouault
Citations: 9
h-index: 1
Virgile Richard
Virgile Richard
Citations: 38
h-index: 4
Vladislav V. Bataev
Vladislav V. Bataev
Citations: 0
h-index: 0
Wassim Bouaziz
Wassim Bouaziz
Citations: 496
h-index: 5
Wen-Ding Li
Wen-Ding Li
Citations: 38
h-index: 4
William Marshall
William Marshall
Citations: 171
h-index: 5
Xinghui Li
Xinghui Li
Citations: 9
h-index: 1
Xingran Guo
Xingran Guo
Citations: 7
h-index: 2
Xinyu Yang
Xinyu Yang
Citations: 294
h-index: 7
Yannic Neuhaus
Yannic Neuhaus
Citations: 63
h-index: 4
Yihan Wang
Yihan Wang
Citations: 15
h-index: 2
Zaccharie Ramzi
Zaccharie Ramzi
Citations: 707
h-index: 13
Zhenlin Xu
Zhenlin Xu
Citations: 40
h-index: 3
Faruk Ahmed
Faruk Ahmed
Citations: 3
h-index: 1
Han Zhou
Han Zhou
University of Cambridge
Citations: 649
h-index: 13
Prateek Gupta
Prateek Gupta
Citations: 114
h-index: 3
J. S. Roberts
J. S. Roberts
Citations: 106
h-index: 6

본 논문에서는 Voxtral Realtime을 소개합니다. Voxtral Realtime은 오프라인 전사 품질과 동등한 수준의 성능을 1초 미만의 지연 시간으로 제공하는, 스트리밍 방식으로 동작하는 자동 음성 인식 모델입니다. 기존 방식들이 청킹(chunking) 또는 슬라이딩 윈도우(sliding window)를 통해 오프라인 모델을 개선하는 것과는 달리, Voxtral Realtime은 오디오 및 텍스트 스트림 간의 명시적인 정렬을 통해 엔드투엔드(end-to-end) 방식으로 스트리밍에 최적화되어 학습되었습니다. 본 연구에서는 지연 조건(delay conditioning)을 개선하기 위해 새로운 인과적 오디오 인코더(causal audio encoder)와 Ada RMS-Norm을 도입한 Delayed Streams Modeling 프레임워크를 기반으로 합니다. 또한, 13개 언어에 걸친 대규모 데이터셋을 활용하여 사전 학습을 수행했습니다. 480ms의 지연 시간에서 Voxtral Realtime은 널리 사용되는 오프라인 전사 시스템인 Whisper와 동등한 성능을 달성합니다. 모델 가중치는 Apache 2.0 라이선스에 따라 공개됩니다.

Original Abstract

We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling framework, introducing a new causal audio encoder and Ada RMS-Norm for improved delay conditioning. We scale pretraining to a large-scale dataset spanning 13 languages. At a delay of 480ms, Voxtral Realtime achieves performance on par with Whisper, the most widely deployed offline transcription system. We release the model weights under the Apache 2.0 license.

0 Citations
0 Influential
13.5 Altmetric
67.5 Score

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!