2602.11298v2 Feb 11, 2026 cs.AI

Voxtral Realtime: 실시간 자동 음성 인식 모델

Voxtral Realtime

Thibaut Lavril

Citations: 44,624

h-index: 19

Guillaume Lample

Citations: 42,440

h-index: 27

Diego de Las Casas

Citations: 17,205

h-index: 13

Baptiste Rozière

Citations: 14,949

h-index: 10

Olivier Duchenne

Citations: 16,249

h-index: 14

Romain Sauvestre

Citations: 14,907

h-index: 9

Am'elie H'eliou

Citations: 1,311

h-index: 9

Alexander H. Liu

Citations: 31

h-index: 4

Andy Ehrenberg

Citations: 46

h-index: 3

A. Lo

Citations: 192

h-index: 4

Chenkai Sun

Citations: 299

h-index: 6

Jean-Malo Delignon

Citations: 43

h-index: 3

K. Chandu

Citations: 27

h-index: 2

Patrick von Platen

Citations: 9,432

h-index: 16

Pavankumar Reddy Muddireddy

Citations: 1

h-index: 1

R. Arora

Citations: 191

h-index: 6

Sanchit Gandhi

Citations: 3,168

h-index: 8

Sandeep Subramanian

Citations: 1,981

h-index: 5

Srijan Mishra

Citations: 65

h-index: 5

Abhinav Rastogi

Google Research

Citations: 6,416

h-index: 21

Alan Jeffares

Citations: 183

h-index: 8

Albert Q. Jiang

Citations: 2,553

h-index: 9

Alexandre Sablayrolles

Citations: 5,625

h-index: 12

A. Bai

Citations: 9

h-index: 1

Angele Lenglemetz

Citations: 1

h-index: 1

Anmol Agarwal

Citations: 43

h-index: 3

Anton Eliseev

Citations: 18

h-index: 1

Antonia Calvi

Citations: 40

h-index: 2

Arjun Majumdar

Citations: 76

h-index: 3

Baptiste Bout

Citations: 63

h-index: 4

Baudouin De Monicault

Citations: 60

h-index: 4

Benjamin Tibi

Citations: 1

h-index: 1

Clémence Lanfranchi

Citations: 69

h-index: 5

Connor Chen

Citations: 113

h-index: 4

Corentin Barreau

Citations: 40

h-index: 3

Corentin Sautier

ENPC

Citations: 323

h-index: 5

Cyprien Courtot

Citations: 20

h-index: 2

Darius Dabert

Citations: 54

h-index: 3

Elliot Chane-Sane

Citations: 135

h-index: 6

Enguerrand Paquin

Citations: 1

h-index: 1

Federico Baldassarre

Citations: 1

h-index: 1

Gabrielle Berrada

Citations: 63

h-index: 4

Gaetan Ecrepont

Citations: 18

h-index: 1

Gauthier Guinet

Citations: 86

h-index: 5

G. Hayes

Citations: 15

h-index: 1

Georgii Sergeevich Novikov

Skolkovo Institute of Science and Technology

Citations: 131

h-index: 6

Guillaume Martin

Citations: 27

h-index: 2

Gunjan Dhanuka

Citations: 5

h-index: 1

Gunshi Gupta

Citations: 249

h-index: 7

Indraneel Mukherjee

Citations: 303

h-index: 6

Jaeyoung Kim

Citations: 50

h-index: 3

Jan Ludziejewski

Citations: 327

h-index: 7

Jason Rute

Citations: 94

h-index: 6

Joachim Studnia

Citations: 211

h-index: 5

John Harvill

Citations: 5

h-index: 2

Jonas Amar

Citations: 27

h-index: 2

Julien Tauran

Citations: 1

h-index: 1

Karmesh Yadav

Citations: 27

h-index: 2

Kartik Khandelwal

Citations: 71

h-index: 5

Kush Jain

Citations: 66

h-index: 5

Laurence Aitchison

Citations: 138

h-index: 5

Léonard Blier

Citations: 331

h-index: 6

Lingxiao Zhao

Citations: 315

h-index: 6

L. Martin

Citations: 174

h-index: 3

Lucile Saulnier

Citations: 9,072

h-index: 12

Luyu Gao

Citations: 60

h-index: 4

M. Buyl

Citations: 380

h-index: 11

Manan Sharma

Citations: 10

h-index: 2

Margaret Jennings

Citations: 18

h-index: 1

Marie Pellat

Citations: 10,990

h-index: 10

Mark Prins

Citations: 18

h-index: 1

Mathieu Poir'ee

Citations: 18

h-index: 1

Mathilde Guillaumin

Citations: 60

h-index: 4

Matthieu Dinot

Citations: 79

h-index: 5

Matthieu Futeral

Citations: 128

h-index: 5

Maxime Darrin

Citations: 107

h-index: 7

Maximilian Augustin

Citations: 27

h-index: 2

Mert Unsal

Citations: 147

h-index: 3

Mia Chiquier

Citations: 72

h-index: 5

Nathan Grinsztajn

Citations: 558

h-index: 12

N. Gupta

Citations: 3,023

h-index: 25

Olivier Bousquet

Citations: 18

h-index: 1

Patricia Wang

Citations: 94

h-index: 5

Paul Jacob

Citations: 224

h-index: 6

P. Wambergue

Citations: 65

h-index: 5

Paula Kurylowicz

Citations: 90

h-index: 5

Philomène Chagniot

Citations: 65

h-index: 4

Pierre Stock

Citations: 5,411

h-index: 9

Piotr Milo's

Citations: 28

h-index: 2

Pravesh Agrawal

Citations: 194

h-index: 4

Quentin Torroba

Citations: 18

h-index: 1

Ram Ramrakhya

Citations: 192

h-index: 7

R. Shah

Citations: 20

h-index: 2

Roman Soletskyi

Citations: 481

h-index: 8

R. Millner

Citations: 27

h-index: 2

S. Vaze

Citations: 1,738

h-index: 14

Samuel Humeau

Citations: 2,054

h-index: 15

Siddharth Gandhi

Citations: 104

h-index: 6

Sumukh Aithal

Citations: 66

h-index: 5

Szymon Antoniak

Citations: 2,194

h-index: 7

Teven Le Scao

Citations: 17,296

h-index: 21

Théo Cachet

Citations: 42

h-index: 3

Theo Simon Sorg

Citations: 18

h-index: 1

Thomas Chabal

Citations: 60

h-index: 3

Thomas Foubert

Citations: 73

h-index: 4

Thomas Robert

Citations: 27

h-index: 2

Thomas Wang

Citations: 2,005

h-index: 6

Tim Lawson

University of Bristol

Citations: 68

h-index: 4

Tom Bewley

Citations: 76

h-index: 6

Tom Edwards

Citations: 18

h-index: 1

T. Wang

Citations: 1

h-index: 1

Valeriia Nemychnikova

Citations: 61

h-index: 4

Van Phung

Citations: 248

h-index: 3

Vedant Nanda

Citations: 387

h-index: 8

Victor Jouault

Citations: 18

h-index: 1

Virgile Richard

Citations: 60

h-index: 4

Vladislav V. Bataev

Citations: 1

h-index: 1

Wassim Bouaziz

Citations: 526

h-index: 5

Wen-Ding Li

Citations: 60

h-index: 4

William Marshall

Citations: 201

h-index: 5

Xinghui Li

Citations: 19

h-index: 1

Xingran Guo

Citations: 10

h-index: 2

Xinyu Yang

Citations: 334

h-index: 8

Yannic Neuhaus

Citations: 68

h-index: 4

Yihan Wang

Citations: 25

h-index: 2

Zaccharie Ramzi

Citations: 733

h-index: 13

Zhenlin Xu

Citations: 54

h-index: 3

Faruk Ahmed

Citations: 7

h-index: 2

Han Zhou

University of Cambridge

Citations: 735

h-index: 13

Prateek Gupta

Citations: 142

h-index: 3

J. S. Roberts

Citations: 134

h-index: 6

Giada Pistilli

Citations: 10

h-index: 1

Soham Ghosh

Google;Mistral AI;Carnegie Mellon University

Citations: 536

h-index: 8

I. Zhang

Citations: 1

h-index: 1

본 논문에서는 Voxtral Realtime을 소개합니다. Voxtral Realtime은 오프라인 전사 품질과 동등한 수준의 성능을 1초 미만의 지연 시간으로 제공하는, 스트리밍 방식으로 동작하는 자동 음성 인식 모델입니다. 기존 방식들이 청킹(chunking) 또는 슬라이딩 윈도우(sliding window)를 통해 오프라인 모델을 개선하는 것과는 달리, Voxtral Realtime은 오디오 및 텍스트 스트림 간의 명시적인 정렬을 통해 엔드투엔드(end-to-end) 방식으로 스트리밍에 최적화되어 학습되었습니다. 본 연구에서는 지연 조건(delay conditioning)을 개선하기 위해 새로운 인과적 오디오 인코더(causal audio encoder)와 Ada RMS-Norm을 도입한 Delayed Streams Modeling 프레임워크를 기반으로 합니다. 또한, 13개 언어에 걸친 대규모 데이터셋을 활용하여 사전 학습을 수행했습니다. 480ms의 지연 시간에서 Voxtral Realtime은 널리 사용되는 오프라인 전사 시스템인 Whisper와 동등한 성능을 달성합니다. 모델 가중치는 Apache 2.0 라이선스에 따라 공개됩니다.

Original Abstract

We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling framework, introducing a new causal audio encoder and Ada RMS-Norm for improved delay conditioning. We scale pretraining to a large-scale dataset spanning 13 languages. At a delay of 480ms, Voxtral Realtime achieves performance on par with Whisper, the most widely deployed offline transcription system. We release the model weights under the Apache 2.0 license.

1 Citations

0 Influential

13.5 Altmetric

68.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!