2605.29430v1 May 28, 2026 cs.AI

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Kai Yu

Citations: 542

h-index: 13

Zixu Jiang

Citations: 208

h-index: 4

Wupeng Wang

Citations: 156

h-index: 7

Xiangang Li

Citations: 61

h-index: 4

Xie Chen

Citations: 250

h-index: 7

Yanqiao Zhu

Citations: 181

h-index: 3

Zhifu Gao

Citations: 2,583

h-index: 19

Peng Wang

Citations: 0

h-index: 0

Qinyu Chen

Citations: 47

h-index: 3

Xinjian Zhao

Citations: 4

h-index: 1

Xipeng Qiu

Citations: 20

h-index: 2

Automatic speech recognition (ASR) is a core component of human--computer interaction and an increasingly important front-end for LLM-based assistants and agents. However, most current ASR systems still follow a single-pass paradigm, which is poorly aligned with human communication, where misunderstandings are resolved through iterative clarification and refinement. This mismatch makes it difficult to correct meaning-critical errors once they occur. Meanwhile, token-level metrics such as WER or CER cannot adequately reflect such a problem. To address these limitations, we formulate \emph{Interactive ASR} as a multi-turn refinement task and propose \textbf{Agentic ASR}, a closed-loop framework that combines a single-pass ASR front-end with semantic correction, intent routing, and reasoning-based editing. We further introduce the \textbf{Sentence-level Semantic Error Rate} ($S^2ER$), an LLM-based semantic evaluation metric, together with an \textbf{Interactive Simulation System} for scalable and reproducible benchmarking. Experiments on multilingual, named-entity-intensive, and code-switching benchmarks show that iterative interaction consistently reduces semantic errors, with much larger gains in $S^2ER$ than in conventional token-level metrics. Human--AI alignment and ablation studies further validate the reliability of the semantic judge and the robustness of the proposed framework. The code is available at: https://interactiveasr.github.io/ and the live demo is available at https://i-asr.sjtuxlance.com/

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!