2601.15593v1 Jan 22, 2026 cs.CL

마스크 디퓨전 언어 모델에서의 병렬 처리 및 생성 순서: 현재의 한계, 미래의 잠재력

Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

Zhengqing Zang

Citations: 7

h-index: 1

Yuqi Ding

Citations: 7

h-index: 1

Yanmei Gu

Citations: 217

h-index: 5

Zhenzhong Lan

Citations: 247

h-index: 7

Junlin Zhou

Citations: 327

h-index: 6

Liwang Zhu

Citations: 135

h-index: 4

Yangyang Zhong

Citations: 31

h-index: 3

Xiaomeng Li

Citations: 10

h-index: 1

Xibei Jia

Citations: 52

h-index: 5

Yu-Hong Shen

Citations: 15

h-index: 2

Zhongyi Yu

University of Edinburgh, Beijing Normal University-Hong Kong Baptist University United International College (UIC)

Citations: 59

h-index: 4

Pengxin Luo

Citations: 13

h-index: 2

Donglian Qi

Citations: 182

h-index: 7

Yunfeng Yan

Citations: 79

h-index: 4

Junbo Zhao

Citations: 9

h-index: 2

Weiping Liu

Citations: 7

h-index: 1

Haisheng Liu

Citations: 7

h-index: 1

마스크 디퓨전 언어 모델(MDLM)은 병렬 토큰 생성 및 임의 순서의 디코딩을 약속하지만, 현재 모델이 이러한 기능을 실제로 얼마나 구현하는지는 불분명합니다. 본 연구에서는 평균 완료 병렬성(AFP) 및 켄달의 타우를 사용하여 MDLM의 동작을 병렬 처리 강도 및 생성 순서라는 두 가지 측면에서 분석합니다. 58개의 벤치마크(지식, 추론, 프로그래밍 영역 포함)를 사용하여 최대 1000억 개의 파라미터를 가진 8개의 주요 MDLM 모델을 평가했습니다. 결과는 MDLM이 여전히 유사한 크기의 자기 회귀 모델에 비해 성능이 낮다는 것을 보여주며, 이는 주로 병렬 확률 모델링으로 인해 토큰 간의 의존성이 약화되기 때문입니다. 동시에 MDLM은 적응적인 디코딩 동작을 보입니다. 즉, 병렬 처리 및 생성 순서는 작업 도메인, 추론 단계, 출력의 정확 여부에 따라 크게 달라집니다. '역방향 정보'가 필요한 작업(예: 스도쿠)의 경우, MDLM은 솔루션 순서를 채택하여 일반적으로 더 쉬운 스도쿠 빈칸을 먼저 채우는 경향을 보이며, 이는 MDLM의 장점을 보여줍니다. 마지막으로, 우리는 의존성 손실을 완화하면서 병렬 디코딩의 효율성을 유지하는 '생성 후 편집(Generate-then-Edit)' 패러다임을 뒷받침하는 이론적 근거와 설계 통찰력을 제공합니다.

Original Abstract

Masked Diffusion Language Models (MDLMs) promise parallel token generation and arbitrary-order decoding, yet it remains unclear to what extent current models truly realize these capabilities. We characterize MDLM behavior along two dimensions -- parallelism strength and generation order -- using Average Finalization Parallelism (AFP) and Kendall's tau. We evaluate eight mainstream MDLMs (up to 100B parameters) on 58 benchmarks spanning knowledge, reasoning, and programming. The results show that MDLMs still lag behind comparably sized autoregressive models, mainly because parallel probabilistic modeling weakens inter-token dependencies. Meanwhile, MDLMs exhibit adaptive decoding behavior: their parallelism and generation order vary significantly with the task domain, the stage of reasoning, and whether the output is correct. On tasks that require "backward information" (e.g., Sudoku), MDLMs adopt a solution order that tends to fill easier Sudoku blanks first, highlighting their advantages. Finally, we provide theoretical motivation and design insights supporting a Generate-then-Edit paradigm, which mitigates dependency loss while retaining the efficiency of parallel decoding.

8 Citations

0 Influential

3.5 Altmetric

25.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!