2602.23592v1 Feb 27, 2026 cs.RO

KEEP: 효율적인 로봇 제어를 위한 KV 캐시 중심의 메모리 관리 시스템

KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning

Bo Yu

Citations: 38

h-index: 3

Zebin Yang

Citations: 42

h-index: 3

Tong Xie

Citations: 207

h-index: 5

Baotong Lu

Citations: 143

h-index: 4

Shaoshan Liu

Citations: 9

h-index: 2

Meng Li

Citations: 38

h-index: 3

메모리를 활용한 대규모 언어 모델(LLM)은 복잡하고 장기적인 로봇 제어 작업에서 뛰어난 성능을 보여줍니다. 메모리는 과거 경험과 환경 상태를 추적하여 LLM이 전체적인 시야를 유지하고 반복적인 탐색을 피할 수 있도록 합니다. 그러나 기존 방식은 메모리를 주로 텍스트 형태로 저장하여 프롬프트 길이가 지나치게 길어지고, 초기 로딩 시간이 오래 걸리는 문제가 있습니다. KV 캐시를 저장하고 재사용하는 것은 가능하지만, 빈번한 KV 캐시 업데이트로 인해 효율성이 크게 저하됩니다. 본 논문에서는 효율적인 로봇 제어를 위한 KV 캐시 중심의 메모리 관리 시스템인 KEEP을 제안합니다. KEEP은 다음과 같은 3가지 주요 혁신을 특징으로 합니다. (1) 혼합적인 메모리 그룹을 사용하여 KV 캐시 재계산을 줄이는 정적-동적 메모리 구성 알고리즘, (2) 다양한 메모리 그룹 간의 중요한 cross-attention을 동적으로 식별하고 반복적으로 메모리 상호 작용을 재구성하는 다중 단계 메모리 재계산 알고리즘, (3) 서로 다른 레이어 간의 불균형한 KV 캐시 로딩과 cross-attention 계산을 제거하는 레이어 균형 메모리 로딩 방식입니다. 광범위한 실험 결과는 KEEP이 ALFRED 데이터셋에서 텍스트 기반 메모리 방식에 비해 2.68배 빠른 속도를 달성하며, 정확도 손실은 미미하다는 것을 보여줍니다. 또한, KV 재계산 방식인 CacheBlend (EuroSys'25)와 비교했을 때, KEEP은 성공률이 4.13% 향상되고, 토큰 생성 시간(TTFT)이 1.90배 감소합니다. 저희의 코드는 https://github.com/PKU-SEC-Lab/KEEP_Embodied_Memory 에서 확인할 수 있습니다.

Original Abstract

Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important cross-attention among different memory groups and reconstructs memory interactions iteratively; (3) a Layer-balanced Memory Loading that eliminates unbalanced KV cache loading and cross-attention computation across different layers. Extensive experimental results have demonstrated that KEEP achieves 2.68x speedup with negligible accuracy loss compared with text-based memory methods on ALFRED dataset. Compared with the KV re-computation method CacheBlend (EuroSys'25), KEEP shows 4.13% success rate improvement and 1.90x time-to-first-token (TTFT) reduction. Our code is available on https://github.com/PKU-SEC-Lab/KEEP_Embodied_Memory.

0 Citations

0 Influential

30.547189562171 Altmetric

152.7 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!