2606.06034v1 Jun 04, 2026 cs.LG

When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet

Weili Zeng
Weili Zeng
Citations: 27
h-index: 2
Yuwei Ren
Yuwei Ren
Citations: 28
h-index: 4
Lingjuan Ge
Lingjuan Ge
Citations: 15
h-index: 3
Denghao Li
Denghao Li
Citations: 4
h-index: 1
M. H. Langston
M. H. Langston
Citations: 85
h-index: 6
Liang Zhang
Liang Zhang
Citations: 11
h-index: 2
Luoming Zhang
Luoming Zhang
Citations: 112
h-index: 5
Kui Zhang
Kui Zhang
Citations: 2
h-index: 1
Tianyu Liu
Tianyu Liu
Citations: 32
h-index: 3
Yin-Ruey Huang
Yin-Ruey Huang
Citations: 34
h-index: 2

Matrix inversion in chunk-wise parallel linear attention is a major bottleneck for long-context modeling, particularly on NPUs, where forward-substitution-based methods exhibit limited parallelism and poor hardware utilization. We propose a fast, Matrix Multiplication (MatMul)-based algorithm tailored for strictly lower-triangular matrices arising in chunk-wise linear attention. Motivated by the rapid growth of Neumann-series terms and the diagonal concentration of the inverse matrix, we employ a truncated Neumann expansion with structural masking and parallel residual correction to eliminate sequential dependencies. We further extend our method to low-bits INT by mitigating the dynamic range expansion arising from repeated matrix power operations, and adapt the approximation order and residual step to the chunk size to minimize computational cost while preserving the model's accuracy. Experiments on Qwen3.5-family models demonstrate up to 5$\times$ kernel-level speedup and a 20% reduction in decode-layer overhead, while preserving accuracy under both floating-point and low-precision inference. Our method offers an efficient and hardware-friendly solution for scalable linear attention.

0 Citations
0 Influential
3 Altmetric
15.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!