2606.06453v1 Jun 04, 2026 cs.AI

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Yujia Liu
Yujia Liu
Citations: 0
h-index: 0
Beidi Chen
Beidi Chen
Citations: 490
h-index: 7
Michael Shieh
Michael Shieh
Citations: 384
h-index: 11
Zhihao Jia
Zhihao Jia
Citations: 296
h-index: 5
Yang Zhou
Yang Zhou
Citations: 83
h-index: 4
Xin Zhong
Xin Zhong
Citations: 1
h-index: 1
Ranajoy Sadhukhan
Ranajoy Sadhukhan
Citations: 222
h-index: 5
Zhuoming Chen
Zhuoming Chen
Citations: 903
h-index: 10

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in exploring the sparse attention design. To address this challenge, we present Vortex, a system that combines a Python-embedded frontend language atop a page-centric tensor abstraction for expressing a broad range of sparse attention algorithms, with an efficient backend tightly integrated into modern LLM serving stacks. Vortex enables rapid prototyping, deployment, and evaluation of sparse attention algorithms, effectively translating their theoretical efficiency gains into real-world throughput improvements. As a result, Vortex substantially accelerates the design and iteration of sparse attention algorithms. First, AI agents use Vortex to automatically generate and refine diverse algorithms, the best reaching up to $3.46\times$ higher throughput than full attention while preserving accuracy. Second, Vortex extends sparse attention to emerging architectures and very large models that are otherwise hard to experiment with, reaching up to $4.7\times$ higher throughput on the MLA-based GLM-4.7-Flash and $1.37\times$ on the 229B-parameter MiniMax-M2.7 on NVIDIA B200 GPUs.

0 Citations
0 Influential
5.5 Altmetric
27.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!