2603.20042v1 Mar 20, 2026 cs.CL

LoASR-Bench: 다양한 언어군에서 저자원 자동 음성 인식에 대한 대규모 언어 모델 평가

LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families

Xiaoxue Gao

Citations: 80

h-index: 4

Nancy F. Chen

Citations: 80

h-index: 4

Jianan Chen

Citations: 337

h-index: 11

Tatsuya Kawahara

Citations: 5

h-index: 1

대규모 언어 모델(LLM)은 음성 언어 모델(SpeechLM) 분야에서 상당한 발전을 이루어 왔으며, 풍부한 자원을 활용한 자동 음성 인식(ASR)에서 뛰어난 성능을 보여줍니다. 그러나 기존의 벤치마크는 주로 자원이 풍부한 언어에 초점을 맞추고 있어, 저자원 언어에서의 SpeechLM의 ASR 성능에 대한 이해가 부족합니다. 이러한 간극은 매우 중요합니다. 왜냐하면 실제 ASR 시스템은 저자원 언어를 안정적으로 지원하고 다양한 언어군에 걸쳐 일반화될 수 있어야 하며, 이는 SpeechLM 기반 ASR을 실제 다국어 환경에 배포하는 데 직접적인 장애가 되기 때문입니다. 따라서, SpeechLM의 일반화 성능을 보장하기 위해서는 저자원 언어에 대한 평가가 필수적입니다. 이러한 문제를 해결하기 위해, 우리는 최신 SpeechLM의 다양한 언어군에서의 저자원 자동 음성 인식(ASR) 성능을 평가하기 위한 포괄적인 벤치마크인 **LoASR-Bench**를 제안합니다. LoASR-Bench는 라틴 문자 및 비라틴 문자 스크립트를 모두 포함하는 9개의 언어군에서 25개의 언어를 포함하며, 이를 통해 현재 SpeechLM의 ASR 성능을 언어 및 문자 체계 간에 평가할 수 있습니다. 실험 결과는 최신 SpeechLM이 실제 저자원 언어를 처리하는 데 어려움이 있음을 보여줍니다.

Original Abstract

Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions. However, existing benchmarks predominantly focus on high-resource languages, leaving the ASR behavior of SpeechLMs in low-resource languages insufficiently understood. This gap is critical, as practical ASR systems must reliably support low-resource languages and generalize across diverse language families, and it directly hinders the deployment of SpeechLM-based ASR in real-world multilingual scenarios. As a result, it is essential to evaluate SpeechLMs on low-resource languages to ensure their generalizability across different language families. To address this problem, we propose \textbf{LoASR-Bench}, a comprehensive benchmark designed to evaluate \textbf{lo}w-resource \textbf{a}utomatic \textbf{s}peech \textbf{r}ecognition (\textbf{ASR}) of the latest SpeechLMs across diverse language families. LoASR-Bench comprises 25 languages from 9 language families, featuring both Latin and non-Latin scripts, enabling cross-linguistic and cross-script assessment of ASR performance of current SpeechLMs. Experimental results highlight the limitations of the latest SpeechLMs in handling real-world low-resource languages.

1 Citations

0 Influential

5.5 Altmetric

28.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!