2601.04638v1 Jan 08, 2026 cs.CL

SpeechMedAssist: 의료 상담을 위한 음성 언어 모델의 효율적이고 효과적인 적용

SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation

Sirry Chen

Citations: 0

h-index: 0

Jieyi Wang

Citations: 22

h-index: 1

Wei Chen

Citations: 622

h-index: 9

Zhongyu Wei

Citations: 11

h-index: 2

의료 상담은 본질적으로 음성을 중심으로 이루어집니다. 그러나 대부분의 기존 연구는 장문 기반 상호 작용에 초점을 맞추고 있으며, 이는 번거롭고 환자 친화적이지 않습니다. 최근 음성 언어 모델(SpeechLM)의 발전으로 보다 자연스러운 음성 기반 상호 작용이 가능해졌지만, 의료 음성 데이터의 부족과 음성 데이터에 직접적으로 미세 조정을 수행하는 비효율성으로 인해 SpeechLM이 의료 상담에 널리 사용되지 못하고 있습니다. 본 논문에서는 SpeechMedAssist를 제안합니다. SpeechMedAssist는 환자와 음성 기반의 다중 턴 상호 작용을 수행할 수 있는 SpeechLM입니다. SpeechLM의 구조적 특성을 활용하여, 기존의 단일 단계 학습 방식을 텍스트를 통한 (1) 지식 및 능력 주입, 그리고 제한된 음성 데이터로의 (2) 모달리티 재정렬이라는 두 단계의 패러다임으로 분리하여, 의료 음성 데이터의 요구량을 1만 개의 합성 샘플로 줄였습니다. 의료 상담 시나리오에서 SpeechLM의 성능을 평가하기 위해, 단일 턴 질문 답변과 다중 턴 시뮬레이션 상호 작용을 모두 포함하는 벤치마크를 설계했습니다. 실험 결과는 우리 모델이 대부분의 평가 환경에서 모든 기준 모델보다 효과성과 안정성 측면에서 우수한 성능을 보임을 보여줍니다.

Original Abstract

Medical consultations are intrinsically speech-centric. However, most prior works focus on long-text-based interactions, which are cumbersome and patient-unfriendly. Recent advances in speech language models (SpeechLMs) have enabled more natural speech-based interaction, yet the scarcity of medical speech data and the inefficiency of directly fine-tuning on speech data jointly hinder the adoption of SpeechLMs in medical consultation. In this paper, we propose SpeechMedAssist, a SpeechLM natively capable of conducting speech-based multi-turn interactions with patients. By exploiting the architectural properties of SpeechLMs, we decouple the conventional one-stage training into a two-stage paradigm consisting of (1) Knowledge & Capability Injection via Text and (2) Modality Re-alignment with Limited Speech Data, thereby reducing the requirement for medical speech data to only 10k synthesized samples. To evaluate SpeechLMs for medical consultation scenarios, we design a benchmark comprising both single-turn question answering and multi-turn simulated interactions. Experimental results show that our model outperforms all baselines in both effectiveness and robustness in most evaluation settings.

1 Citations

0 Influential

4.5 Altmetric

23.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!