2601.02076v2 Jan 05, 2026 cs.CL

디퓨전 언어 모델을 위한 지연 결정 디코딩

Deferred Commitment Decoding for Diffusion Language Models

Yuchuan Tian

Citations: 233

h-index: 9

Yingte Shu

Citations: 22

h-index: 2

Chao Xu

Citations: 202

h-index: 7

Hanting Chen

Citations: 8,348

h-index: 29

Yunhe Wang

Citations: 200

h-index: 7

최근 디퓨전 언어 모델(DLM)은 병렬 텍스트 생성을 가능하게 하여, 자기 회귀 모델의 강력한 대안으로 등장했습니다. 추론 효율성과 KV 캐시 호환성을 향상시키기 위해, 기존 연구에서는 일반적으로 블록 기반 디퓨전을 채택하여 토큰을 블록 단위로 디코딩합니다. 그러나 이러한 방식은 '경계 유도 컨텍스트 단축(BICT)'이라는 구조적 한계를 가지고 있습니다. 즉, 블록 경계 근처의 아직 디코딩되지 않은 토큰들은 주변의 미래 컨텍스트에 접근할 수 없으면서, 불확실성을 크게 줄일 수 있는 컨텍스트가 있음에도 불구하고 결정(commit)해야 합니다. 이러한 한계는 디코딩의 확실성과 생성 품질을 저하시키며, 특히 수학 문제 해결 및 코드 생성과 같이 정밀한 추론이 필요한 작업에서 더욱 두드러집니다. 본 논문에서는 이러한 문제를 완화하는 새로운, 학습이 필요 없는 디코딩 전략인 '지연 결정 디코딩(DCD)'을 제안합니다. DCD는 마스크된 토큰에 대한 불확실성 정보를 활용하는 슬라이딩 윈도우를 유지하며, 낮은 불확실성을 가진 토큰은 빠르게 결정하고, 높은 불확실성을 가진 토큰은 충분한 컨텍스트 정보가 확보될 때까지 지연시킵니다. 여러 디퓨전 언어 모델, 벤치마크 및 캐싱 구성에 대한 광범위한 실험 결과, DCD는 평균적으로 1.73%의 생성 정확도 향상을 보였으며, 이는 고정된 블록 기반 디퓨전 방식과 비교하여 비슷한 시간 내에 달성된 결과입니다. 가장 큰 성능 향상은 16.5%에 달했습니다. 이러한 결과는 불확실성에 기반하여 토큰 결정을 지연시키는 것이 디퓨전 언어 모델 디코딩의 품질과 효율성을 향상시키는 간단하면서도 효과적인 원칙임을 보여줍니다.

Original Abstract

Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache compatibility, prior work commonly adopts block-based diffusion, decoding tokens block by block. However, this paradigm suffers from a structural limitation that we term Boundary-Induced Context Truncation (BICT): undecoded tokens near block boundaries are forced to commit without access to nearby future context, even when such context could substantially reduce uncertainty. This limitation degrades decoding certainty and generation quality, especially for tasks requiring precise reasoning, such as mathematical problem solving and code generation. We propose Deferred Commitment Decoding (DCD), a novel, training-free decoding strategy that mitigates this issue. DCD maintains a certainty-aware sliding window over masked tokens, resolving low-uncertainty tokens early while deferring high-uncertainty tokens until sufficient contextual evidence becomes available. Extensive experiments across multiple diffusion language models, benchmarks, and caching configurations show that DCD improves generation accuracy by 1.73% with comparable time on average compared to fixed block-based diffusion methods, with the most significant improvement reaching 16.5%. These results demonstrate that deferring token commitment based on uncertainty is a simple yet effective principle for improving both the quality and efficiency of diffusion language model decoding.

3 Citations

0 Influential

14.5 Altmetric

75.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!