2603.22267v1 Mar 23, 2026 cs.CL

TiCo: 음성 대화 모델을 위한 시간 제어 가능 학습

TiCo: Time-Controllable Training for Spoken Dialogue Models

Kai-Wei Chang

MIT CSAIL

Citations: 596

h-index: 13

James Glass

Citations: 11

h-index: 2

Wei-Chih Chen

Citations: 303

h-index: 10

En-Pei Hu

Citations: 224

h-index: 6

Hung-yi Lee

Citations: 526

h-index: 11

본 논문에서는 음성 대화 모델(SDM)이 시간 제약 조건을 준수하고 제어 가능한 길이의 응답을 생성할 수 있도록 하는 간단한 사후 학습 방법인 TiCo를 제안합니다. 이러한 기능은 음성 비서 및 대화형 에이전트와 같은 실제 음성 언어 시스템에서 응답 길이를 제어함으로써 상호 작용 품질을 향상시키는 데 유용합니다. 그러나 기존 모델은 자연스러운 음성 응답을 생성하는 데 강력하지만, 시간 인식이 부족하고 시간과 관련된 지침(예: "약 15초 정도 지속되는 응답을 생성해 주세요")을 따르는 데 어려움을 겪습니다. 오픈 소스 및 상용 SDM을 대상으로 실시한 실험적 평가 결과, 기존 모델이 이러한 시간 제어 요구 사항을 충족하지 못하는 경우가 빈번하게 발생함을 확인했습니다. TiCo는 Spoken Time Markers (STM) (예: <10.6초>)를 사용하여 모델이 응답 생성 과정에서 경과된 음성 시간을 추정하도록 함으로써 이러한 한계를 극복합니다. 이러한 마커는 모델이 시간에 대한 인식을 유지하고 목표 지속 시간을 충족하도록 나머지 내용을 조정하는 데 도움이 됩니다. TiCo는 간단하고 효율적이며, 소량의 데이터만 필요하며 추가적인 질의응답 쌍이 필요하지 않습니다. 대신, 자체 생성 및 강화 학습을 활용합니다. 실험 결과는 TiCo가 응답 품질을 유지하면서도 시간 제약 조건 준수도를 크게 향상시키는 것을 보여줍니다.

Original Abstract

We propose TiCo, a simple post-training method for enabling spoken dialogue models (SDMs) to follow time-constrained instructions and generate responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions (e.g., "Please generate a response lasting about 15 seconds"). Through an empirical evaluation of both open-source and commercial SDMs, we show that they frequently fail to satisfy such time-control requirements. TiCo addresses this limitation by enabling models to estimate elapsed speaking time during generation through Spoken Time Markers (STM) (e.g., <10.6 seconds>). These markers help the model maintain awareness of time and adjust the remaining content to meet the target duration. TiCo is simple and efficient: it requires only a small amount of data and no additional question-answer pairs, relying instead on self-generation and reinforcement learning. Experimental results show that TiCo significantly improves adherence to duration constraints while preserving response quality.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!