2605.25475v1 May 25, 2026 cs.CL

IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference

Yike Guo
Yike Guo
Citations: 493
h-index: 11
Bei Liu
Bei Liu
Citations: 14
h-index: 3
Jiacheng Liu
Jiacheng Liu
Citations: 260
h-index: 6
Hao Gu
Hao Gu
Citations: 104
h-index: 5
Lujun Li
Lujun Li
Citations: 139
h-index: 4
Qiyuan Zhu
Qiyuan Zhu
Citations: 105
h-index: 3
Sirui Han
Sirui Han
Citations: 106
h-index: 3
Bin Xu
Bin Xu
Citations: 27
h-index: 3
Xintong Yang
Xintong Yang
Citations: 228
h-index: 7

Large Language Models (LLMs) are increasingly expected to operate over long contexts, yet standard softmax attention incurs a KV cache that grows linearly with sequence length, quickly becoming the bottleneck for long context inference. A practical remedy is to evict less important KV entries; however, existing eviction policies are largely heuristic and struggle to capture the rich, input-dependent distribution of token importance. In this work, we introduce a learnable indexer that predicts KV importance, enabling more accurate retention of critical tokens. Meanwhile, naively evicting tokens permanently discards their information, leading to irreversible forgetting and degraded retrieval over long ranges. To address this, we propose a lightweight latent memory module that compresses evicted tokens into a compact, online-updated state and provides residual readouts to compensate for the attention contributions lost through KV eviction. Collectively, our method enables accurate long-context inference under a bounded KV budget, delivering consistent improvements on RULER (4K/16K) across Qwen, Mistral, and Llama models (up to 25 points under aggressive eviction), markedly more stable Needle-in-a-Haystack retrieval, and superior LongBench scores and compression curves compared to existing eviction policies.

0 Citations
0 Influential
5.5 Altmetric
27.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!