2605.07230v1 May 08, 2026 cs.CV

CASCADE: 문맥 인식 기반 휴리스틱 이완을 통한 추론 기반 이미지 디코딩

CASCADE: Context-Aware Relaxation for Speculative Image Decoding

Vikram V. Appia

Citations: 392

h-index: 12

Selin Yildirim

Citations: 0

h-index: 0

Subhajit Dutta Chowdhury

Citations: 85

h-index: 4

Mohammad Mahdi Kamani

Pennsylvania State University

Citations: 1,696

h-index: 13

Deming Chen

Citations: 775

h-index: 5

오토리그레시브 생성은 고품질 이미지 합성에 강력한 접근 방식이지만, 최첨단 가속기에서도 계산 비용이 많이 들고 속도가 느립니다. 추론 기반 디코딩은 이러한 병목 현상을 완화하기 위해 연구되어 왔지만, 기존 방법은 텍스트 생성에서 관찰되는 효율성 향상에 미치지 못합니다. 주요 제한 사항은 이미지 생성 과정에서 대상 모델의 높은 불확실성으로 인해 초안 토큰 거부율이 높다는 것입니다. 본 연구에서는 트리 기반 추론 기반 디코딩에서 자연스럽게 나타나는 대상 모델의 행동 패턴을 새롭게 밝혀냈습니다. 특히, 대상 모델의 숨겨진 상태 표현에서 발생하는 중복성으로 인해 발생하는 두 가지 특징, 즉 의미적 교환 가능성과 수렴성을 공식화했습니다. 우리의 방법은 예측된 토큰 트리의 깊이와 너비에 걸쳐 이러한 중복성을 활용하여 추가적인 훈련 없이도 효율적인 초안 수용 이완 기회를 식별합니다. 또한, 대상 모델에서 얻은 중복성 신호를 최소한의 수정으로 초안 생성기의 훈련에 주입하여 독립적인 초안 생성기 성능을 향상시킵니다. 저희는 다양한 텍스트-이미지 모델 및 초안 생성기 아키텍처에서 저희의 접근 방식을 평가했습니다. 결과는 CASCADE가 초안 기반 추론 기반 디코딩에서 최첨단 수준의 속도 향상을 달성하며, 최대 3.6배의 가속을 제공하면서도 이미지 품질과 텍스트 프롬프트 충실도를 유지한다는 것을 보여줍니다.

Original Abstract

Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method identifies principled opportunities for acceptance relaxation without requiring additional training. Additionally, we enhance standalone drafter performance by injecting the redundancy signals from the target model into drafter training with minimal modification. We evaluate our approach across multiple text-to-image models and drafter architectures. Results show that CASCADE achieves state-of-the-art speedups for drafter-based speculative decoding, with up to 3.6x acceleration, while maintaining image quality and text-prompt fidelity.

0 Citations

0 Influential

6.5 Altmetric

32.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!