2605.26089v1 May 25, 2026 cs.CV

Channel-wise Vector Quantization

Wei Song
Wei Song
Citations: 116
h-index: 5
Tianhang Wang
Tianhang Wang
Citations: 17
h-index: 2
Zuxuan Wu
Zuxuan Wu
Citations: 26
h-index: 4
Jiaqi Wang
Jiaqi Wang
Citations: 59
h-index: 3
Yi-Ting Chen
Yi-Ting Chen
Citations: 0
h-index: 0
Tong Zhang
Tong Zhang
Citations: 2
h-index: 1
Ming Li
Ming Li
Citations: 27
h-index: 4
Kaicheng yu
Kaicheng yu
Citations: 100
h-index: 4

We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns a discrete token to each patch feature vector, CVQ quantizes each channel of the feature map. This formulation represents an image as discrete levels of visual details, rather than as a grid of spatial patches. Based on CVQ, we introduce a new visual autoregressive framework with "next-channel prediction". Instead of rendering images patch by patch in raster order, our Channel-wise Autoregressive (CAR) model predicts image channels sequentially, producing progressively enriched visual details. Specifically, it first sketches global structure and then refines fine-grained attributes, akin to a human artist's workflow. Empirically, we show that: (1) CVQ achieves 100% codebook utilization with a 16K+ codebook size without any bells and whistles, and substantially improves reconstruction quality over conventional VQ; and (2) CAR attains a DPG score of 86.7 and a GenEval score of 0.79, demonstrating strong effectiveness for text-to-image generation.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!