2601.13599v2 Jan 20, 2026 cs.LG

확산 내의 확산: 부분 자기 회귀 확산 모델에서 전역 일관성 회복

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Yunhe Wang

Citations: 341

h-index: 9

Kai Han

Citations: 7

h-index: 2

Yufei Cui

Citations: 6

h-index: 2

Linrui Ma

Citations: 7

h-index: 2

글로벌 이산 확산 언어 모델의 가장 매력적인 특징 중 하나는 전역 양방향 문맥 이해 능력입니다. 그러나 기존의 블록 기반 확산 연구는 종종 자기 회귀 사전 지식을 도입하는데, 이는 이점을 제공하지만 모델이 거시 수준에서 전역 일관성을 잃게 만들 수 있습니다. 본 연구에서는 부분 자기 회귀 패러다임의 장점을 유지하면서 전역 문맥 이해 능력을 회복하기 위해 '초안 작성 후 개선' 프레임워크인 '확산 내의 확산'을 제안합니다. 우리의 접근 방식은 먼저 작은 블록을 사용하여 블록 확산을 통해 빠른 초안을 생성한 다음, 더 넓은 양방향 수용 영역을 가진 글로벌 양방향 확산을 통해 이러한 초안을 개선합니다. 우리는 스냅샷 신뢰도 재마스킹을 사용하여 수정이 필요한 가장 중요한 토큰을 식별하고, 블록 확산 모델의 전역 능력을 확장하기 위해 믹스-스케일 학습을 적용합니다. 실험 결과는 우리의 접근 방식이 OpenWebText 데이터셋에서 이산 확산 모델에 대한 새로운 성능 기준을 제시한다는 것을 보여줍니다. 기준 모델의 미세 조정 예산의 26%만을 사용하여 생성 퍼플렉시티를 25.7에서 21.9로 줄여, 자기 회귀 모델과의 성능 격차를 크게 좁혔습니다.

Original Abstract

One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.

2 Citations

0 Influential

4.5 Altmetric

24.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!