2605.29430v1 May 28, 2026 cs.AI

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Kai Yu
Kai Yu
Citations: 542
h-index: 13
Zixu Jiang
Zixu Jiang
Citations: 208
h-index: 4
Wupeng Wang
Wupeng Wang
Citations: 156
h-index: 7
Xiangang Li
Xiangang Li
Citations: 61
h-index: 4
Xie Chen
Xie Chen
Citations: 250
h-index: 7
Yanqiao Zhu
Yanqiao Zhu
Citations: 181
h-index: 3
Zhifu Gao
Zhifu Gao
Citations: 2,583
h-index: 19
Peng Wang
Peng Wang
Citations: 0
h-index: 0
Qinyu Chen
Qinyu Chen
Citations: 47
h-index: 3
Xinjian Zhao
Xinjian Zhao
Citations: 4
h-index: 1
Xipeng Qiu
Xipeng Qiu
Citations: 20
h-index: 2

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based assistants and agents. However, most current ASR systems still follow a single-pass paradigm, which is poorly aligned with human communication, where misunderstandings are resolved through iterative clarification and refinement. This mismatch makes it difficult to correct meaning-critical errors once they occur. Meanwhile, token-level metrics such as WER or CER cannot adequately reflect such a problem. To address these limitations, we formulate \emph{Interactive ASR} as a multi-turn refinement task and propose \textbf{Agentic ASR}, a closed-loop framework that combines a single-pass ASR front-end with semantic correction, intent routing, and reasoning-based editing. We further introduce the \textbf{Sentence-level Semantic Error Rate} ($S^2ER$), an LLM-based semantic evaluation metric, together with an \textbf{Interactive Simulation System} for scalable and reproducible benchmarking. Experiments on multilingual, named-entity-intensive, and code-switching benchmarks show that iterative interaction consistently reduces semantic errors, with much larger gains in $S^2ER$ than in conventional token-level metrics. Human--AI alignment and ablation studies further validate the reliability of the semantic judge and the robustness of the proposed framework. The code is available at: https://interactiveasr.github.io/ and the live demo is available at https://i-asr.sjtuxlance.com/

0 Citations
0 Influential
9.5 Altmetric
47.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!