2604.24278v1 Apr 27, 2026 cs.SD

RAS: 자동 음성 인식 시스템의 신뢰성 지향 메트릭

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

Hankun Wang

Citations: 264

h-index: 7

Jing Peng

Citations: 127

h-index: 4

Xie Chen

Citations: 250

h-index: 7

Bohan Li

Citations: 208

h-index: 5

Kai Yu

Citations: 414

h-index: 9

Wen-Chin Huang

Nagoya University

Citations: 2,482

h-index: 25

Yuhang Qiu

Citations: 142

h-index: 6

Yiwei Guo

Citations: 755

h-index: 14

자동 음성 인식 시스템은 종종 소음이 있거나 모호한 환경에서 자신감 있게 부정확한 전사 결과를 생성하며, 이는 사용자 및 후속 애플리케이션 모두에게 오해를 불러일으킬 수 있습니다. 단어 오류율(WER)에 기반한 표준 평가는 정확도에만 초점을 맞추고 전사의 신뢰성을 제대로 반영하지 못합니다. 본 연구에서는 ASR 모델이 불확실한 부분에 대해 명시적으로 회피할 수 있도록 하는 전사 프레임워크를 소개합니다. 회피(abstention)를 고려한 신뢰성을 평가하기 위해, 전사 정보의 유용성과 오류 회피를 균형 있게 고려하며, 인간의 선호도를 기반으로 조정되는 매개변수를 갖는 신뢰성 지향 메트릭인 RAS를 제안합니다. 이후, 지도 학습 기반 부트스트래핑과 강화 학습을 통해 회피 기능을 갖춘 ASR 모델을 학습합니다. 실험 결과, 전사 신뢰성이 크게 향상되었으며, 동시에 경쟁력 있는 정확도를 유지하는 것을 확인했습니다.

Original Abstract

Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS, a reliability-oriented metric that balances transcription informativeness and error aversion, with its trade-off parameter calibrated by human preference. We then train an abstention-aware ASR model through supervised bootstrapping followed by reinforcement learning. Our experiments demonstrate substantial improvements in transcription reliability while maintaining competitive accuracy.

0 Citations

0 Influential

12.5 Altmetric

62.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!