2603.16292v1 Mar 17, 2026 cs.CL

주의 집중 기반 증거 연계: 음성 질문 답변

Attention-guided Evidence Grounding for Spoken Question Answering

Bolin Chen

Citations: 0

h-index: 0

Yuejie Li

Citations: 5

h-index: 1

Yue-Yang He

Citations: 4

h-index: 1

Chengjun Mao

Citations: 20

h-index: 2

Ke Yang

Citations: 35

h-index: 3

Yueying Hua

Citations: 27

h-index: 1

Jian-Yun Nie

Citations: 21

h-index: 3

Bowen Li

Citations: 1,082

h-index: 9

음성 질문 답변(Spoken QA)은 음성 질의를 텍스트 지식과 효과적으로 연결해야 하는 어려운 교차 모드 문제입니다. 또한, 기존의 ASR 기반 시스템에서 발생하는 지연 및 오류 전파 문제를 해결해야 합니다. 본 논문에서는 음성 대규모 언어 모델(SpeechLLMs)의 내부 교차 모드 주의 집중 기능을 활용하여 핵심 증거를 명시적으로 찾고 모델의 잠재 공간에 연결하는 새로운 엔드-투-엔드 프레임워크인 주의 집중 기반 증거 연계(AEG)를 제안합니다. 사전 학습된 모델의 분산된 주의 집중 분포 문제를 해결하기 위해, 질의와 관련된 부분을 관련 없는 문맥과 구별하도록 모델의 주의 집중 메커니즘을 조정하는 지도 학습 미세 조정 패러다임인 Learning to Focus on Evidence (LFE)를 제안합니다. SQuAD, HotpotQA, MuSiQue 데이터셋에 대한 실험 결과, AEG는 환각 현상을 줄이고 효율성을 크게 향상시켰으며, Whisper-Large-v3 + Reranker와 같은 대규모 캐스케이드 기반 모델보다 뛰어난 성능을 보였고, 추론 지연 시간을 약 62% 줄였습니다.

Original Abstract

Spoken Question Answering (Spoken QA) presents a challenging cross-modal problem: effectively aligning acoustic queries with textual knowledge while avoiding the latency and error propagation inherent in cascaded ASR-based systems. In this paper, we introduce Attention-guided Evidence Grounding (AEG), a novel end-to-end framework that leverages the internal cross-modal attention of Speech Large Language Models (SpeechLLMs) to explicitly locate and ground key evidence in the model's latent space. To address the diffuse attention distribution in pre-trained models, we propose Learning to Focus on Evidence (LFE), a supervised fine-tuning paradigm that calibrates the model's attention mechanism to distinguish query-relevant segments from irrelevant context. Experiments on SQuAD, HotpotQA, and MuSiQue demonstrate that AEG reduces hallucinations and achieves strong efficiency gains, outperforming large-scale cascaded baselines (Whisper-Large-v3 + Reranker) while reducing inference latency by approximately 62%.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!