2602.00250v2 Jan 30, 2026 cs.LG

TABES: 경로 인지 역방향 엔트로피 제어 - 마스크 기반 확산 모델

TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models

Shreshth Saini

Citations: 27

h-index: 2

N. Birkbeck

Citations: 2,335

h-index: 24

Balu Adsumilli

Citations: 41

h-index: 4

Yilin Wang

Citations: 3,176

h-index: 16

Avinab Saha

Citations: 269

h-index: 7

A. Bovik

Citations: 151,060

h-index: 116

마스크 기반 확산 모델(MDM)은 생성 작업에 있어 유망한 비자기 회귀 패러다임으로, 병렬 디코딩과 양방향 컨텍스트 활용을 제공합니다. 그러나 현재의 샘플링 방법은 간단한 신뢰도 기반 휴리스틱에 의존하며, 이는 로컬 결정의 장기적인 영향을 무시하여 초기 환각이 전반적인 일관성 부족으로 이어지는 경로 고착 현상을 초래합니다. 검색 기반 방법은 이러한 문제를 완화하지만, 계산 비용이 매우 높습니다(단계당 O(K)의 순방향 패스). 본 연구에서는 역방향 엔트로피(BoE) 제어를 제안합니다. 이는 단일 역방향 패스를 통해 무한 지평선 예측을 근사하는 기울기 기반 추론 프레임워크입니다. 우리는 1차 확장을 통해 경로 비용 함수의 토큰 영향 점수(TIS)를 공식적으로 유도하고, 입력 임베딩에 대한 미래 엔트로피의 기울기가 불확실성을 최소화하는 최적의 제어 신호 역할을 한다는 것을 증명했습니다. 확장성을 확보하기 위해, 마스크 목표의 구조를 활용하여 역방향 패스 복잡성을 줄이는 희소 수반 연산인 exttt{ActiveQueryAttention}을 도입했습니다. BoE는 기존의 마스킹 해제 방법에 비해 추론 시간 확장 측면에서 우수한 성능을 보여주며, 기울기 기반 제어가 강력한 비자기 회귀 생성을 위한 수학적으로 타당하고 효율적인 방법을 제공한다는 것을 입증합니다. 코드를 공개할 예정입니다.

Original Abstract

Masked Diffusion Models (MDMs) have emerged as a promising non-autoregressive paradigm for generative tasks, offering parallel decoding and bidirectional context utilization. However, current sampling methods rely on simple confidence-based heuristics that ignore the long-term impact of local decisions, leading to trajectory lock-in where early hallucinations cascade into global incoherence. While search-based methods mitigate this, they incur prohibitive computational costs ($O(K)$ forward passes per step). In this work, we propose Backward-on-Entropy (BoE) Steering, a gradient-guided inference framework that approximates infinite-horizon lookahead via a single backward pass. We formally derive the Token Influence Score (TIS) from a first-order expansion of the trajectory cost functional, proving that the gradient of future entropy with respect to input embeddings serves as an optimal control signal for minimizing uncertainty. To ensure scalability, we introduce \texttt{ActiveQueryAttention}, a sparse adjoint primitive that exploits the structure of the masking objective to reduce backward pass complexity. BoE achieves a superior Pareto frontier for inference-time scaling compared to existing unmasking methods, demonstrating that gradient-guided steering offers a mathematically principled and efficient path to robust non-autoregressive generation. We will release the code.

2 Citations

0 Influential

30 Altmetric

152.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!