2605.29948v1 May 28, 2026 cs.SD

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Hankun Wang
Hankun Wang
Citations: 264
h-index: 7
Kai Yu
Kai Yu
Citations: 176
h-index: 6
Yiwei Guo
Yiwei Guo
Citations: 755
h-index: 14
Colin Zhang
Colin Zhang
Citations: 85
h-index: 6
Shiyue Lian
Shiyue Lian
Citations: 1
h-index: 1
Yu Xi
Yu Xi
Citations: 166
h-index: 6
Da Zheng
Da Zheng
Citations: 34
h-index: 2
Zhihan Li
Zhihan Li
Citations: 132
h-index: 6
Bohan Li
Bohan Li
Citations: 18
h-index: 1

Unified speech foundation models require a holistic tokenization space that is both learnable by language models and decodable into high-quality waveforms. Existing speech tokenizers, however, often fail to satisfy these requirements simultaneously, leading to increased architectural complexity and more involved training designs. We propose HoliTok, a continuous Holistic speech Tokenization model designed for unified generation-understanding modeling. HoliTok encodes 48~kHz speech into a compact 25~Hz sequence of 128-dimensional latents. It is trained with a progressive strategy that jointly preserves signal-level fidelity, incorporates semantic information, and maintains strong latent learnability. Based on this tokenization, we build a unified AR+DiT model for speech synthesis and recognition, where the same latent sequence supports both generation-specific and unified generation-understanding tasks. Experiments show that HoliTok achieves competitive reconstruction fidelity, improves generative learnability for high-quality and controllable synthesis, and, among the evaluated representations, is the only one that operates robustly in our unified generation-understanding architecture without additional optimization tricks. These results suggest that HoliTok serves as an effective speech tokenizer and a foundational representation interface for unified spoken language modeling. The code is available at: https://github.com/bovod-sjtu/HoliTok.

1 Citations
0 Influential
40.540251005511 Altmetric
203.7 Score
Original PDF
14

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!