2602.08676v3 Feb 09, 2026 cs.LG

LLaDA2.1: 토큰 편집을 통한 텍스트 확산 속도 향상

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Kevin I-Kai Wang
Kevin I-Kai Wang
Citations: 0
h-index: 0
Yanmei Gu
Yanmei Gu
Citations: 87
h-index: 3
Tiwei Bie
Tiwei Bie
Citations: 56
h-index: 3
Maosong Cao
Maosong Cao
Citations: 40
h-index: 1
Bin Chen
Bin Chen
Citations: 269
h-index: 5
Fu-Chen Chen
Fu-Chen Chen
Citations: 22
h-index: 2
Kun Chen
Kun Chen
Citations: 60
h-index: 2
Lun Du
Lun Du
Citations: 134
h-index: 4
Daozhuo Feng
Daozhuo Feng
Citations: 1
h-index: 1
Haibo Feng
Haibo Feng
Citations: 47
h-index: 3
Mingliang Gong
Mingliang Gong
Citations: 40
h-index: 1
Zhuochen Gong
Zhuochen Gong
Citations: 43
h-index: 2
Jian Guan
Jian Guan
Citations: 23
h-index: 1
Kaiyuan Guan
Kaiyuan Guan
Citations: 2
h-index: 1
Hongliang He
Hongliang He
Citations: 488
h-index: 8
Zenan Huang
Zenan Huang
Citations: 163
h-index: 7
Juyong Jiang
Juyong Jiang
Citations: 1,056
h-index: 6
Zhonghui Jiang
Zhonghui Jiang
Citations: 74
h-index: 3
Zhenzhong Lan
Zhenzhong Lan
Citations: 105
h-index: 5
Chengxi Li
Chengxi Li
Citations: 136
h-index: 4
Jianguo Li
Jianguo Li
Citations: 131
h-index: 5
Zehuan Li
Zehuan Li
Citations: 41
h-index: 1
Huabin Liu
Huabin Liu
Citations: 60
h-index: 2
Lin Liu
Lin Liu
Citations: 90
h-index: 4
Guoshan Lu
Guoshan Lu
Citations: 256
h-index: 8
Yuan Lu
Yuan Lu
Citations: 50
h-index: 2
Yuxin Ma
Yuxin Ma
Citations: 59
h-index: 2
X. Mou
X. Mou
Citations: 4
h-index: 1
Zhenxuan Pan
Zhenxuan Pan
Citations: 13
h-index: 2
Kai Qiu
Kai Qiu
Citations: 175
h-index: 6
Yujie Ren
Yujie Ren
Citations: 58
h-index: 2
Jianfeng Tan
Jianfeng Tan
Citations: 44
h-index: 2
Yi Tian
Yi Tian
Citations: 9
h-index: 2
Zian Wang
Zian Wang
Citations: 48
h-index: 3
Lanning Wei
Lanning Wei
Citations: 116
h-index: 6
Tao Wu
Tao Wu
Citations: 5
h-index: 1
Yipeng Xing
Yipeng Xing
Citations: 69
h-index: 3
Wen-song Ye
Wen-song Ye
Citations: 208
h-index: 6
Liangyu Zha
Liangyu Zha
Citations: 168
h-index: 4
Tianze Zhang
Tianze Zhang
Citations: 53
h-index: 4
Xiaolu Zhang
Xiaolu Zhang
Citations: 651
h-index: 4
Junbo Zhao
Junbo Zhao
Citations: 135
h-index: 5
Da Zheng
Da Zheng
Citations: 85
h-index: 3
Hao Zhong
Hao Zhong
Citations: 45
h-index: 2
Wanli Zhong
Wanli Zhong
Citations: 1
h-index: 1
Junlin Zhou
Junlin Zhou
Citations: 143
h-index: 3
Liwang Zhu
Liwang Zhu
Citations: 48
h-index: 2
Muzhi Zhu
Muzhi Zhu
Citations: 351
h-index: 11
Yihong Zhuang
Yihong Zhuang
Citations: 135
h-index: 5

LLaDA2.0는 1000억 파라미터 규모의 블록 확산 모델의 확장 가능성과 내재적인 병렬 처리 능력을 보여주었지만, 디코딩 속도와 생성 품질 사이의 미묘한 균형을 맞추는 것은 여전히 어려운 과제였습니다. 본 논문에서는 이러한 한계를 극복하기 위해 설계된 새로운 패러다임, LLaDA2.1을 소개합니다. 기존의 마스크-토큰(M2T) 방식에 토큰-토큰(T2T) 편집을 자연스럽게 통합하여, 조절 가능한 임계값 디코딩 방식을 제시합니다. 이러한 구조적 혁신은 두 가지 모드로 작동합니다. '스피디 모드(S Mode)'는 M2T 임계값을 대담하게 낮춰 기존 제약을 우회하고, T2T를 통해 출력을 개선합니다. 반면, '퀄리티 모드(Q Mode)'는 보수적인 임계값을 사용하여 우수한 성능을 유지하면서 효율성 저하를 최소화합니다. 또한, 광범위한 컨텍스트 창을 기반으로, dLLM에 특화된 대규모 강화 학습(RL) 프레임워크를 처음으로 구현했습니다. 이 프레임워크는 안정적인 기울기 추정을 위한 전문적인 기술을 활용하여 추론 정확도를 높이고, 명령어 수행 능력을 향상시켜 확산 과정과 복잡한 인간 의도 사이의 간극을 좁힙니다. 본 연구의 결과물로 LLaDA2.1-Mini (16B)와 LLaDA2.1-Flash (100B) 모델을 공개합니다. 33개의 엄격한 벤치마크 테스트에서 LLaDA2.1은 뛰어난 성능과 놀라운 속도의 디코딩을 보여주었습니다. 특히, 1000억 파라미터 규모임에도 불구하고, 코딩 작업에서 HumanEval+에서 892 TPS, BigCodeBench에서 801 TPS, LiveCodeBench에서 663 TPS의 놀라운 처리량을 달성했습니다.

Original Abstract

While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy Mode (S Mode), which audaciously lowers the M2T threshold to bypass traditional constraints while relying on T2T to refine the output; and the Quality Mode (Q Mode), which leans into conservative thresholds to secure superior benchmark performances with manageable efficiency degrade. Furthering this evolution, underpinned by an expansive context window, we implement the first large-scale Reinforcement Learning (RL) framework specifically tailored for dLLMs, anchored by specialized techniques for stable gradient estimation. This alignment not only sharpens reasoning precision but also elevates instruction-following fidelity, bridging the chasm between diffusion dynamics and complex human intent. We culminate this work by releasing LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B). Across 33 rigorous benchmarks, LLaDA2.1 delivers strong task performance and lightning-fast decoding speed. Despite its 100B volume, on coding tasks it attains an astounding 892 TPS on HumanEval+, 801 TPS on BigCodeBench, and 663 TPS on LiveCodeBench.

1 Citations
0 Influential
5.5 Altmetric
28.5 Score

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!