2604.18933v1 Apr 21, 2026 cs.RO

게이티드 메모리 정책

Gated Memory Policy

Yihuai Gao

Citations: 422

h-index: 5

Shuang Li

Citations: 895

h-index: 11

Shuran Song

Citations: 237

h-index: 6

Jinyu Liu

Citations: 3

h-index: 1

로봇 조작 작업은 다양한 메모리 요구 사항을 갖습니다. 마르코프 특성을 가지는 작업은 메모리가 필요 없지만, 과거 정보를 활용하는 비마르코프 특성을 가지는 작업은 단일 또는 여러 시행 간의 역사적 정보에 의존합니다. 놀랍게도, 시각-운동 정책의 관측 기록을 단순히 늘리는 것은 종종 분포 변화와 과적합으로 인해 성능 저하를 초래합니다. 이러한 문제를 해결하기 위해, 우리는 게이티드 메모리 정책(GMP)을 제안합니다. GMP는 메모리를 언제 사용할지, 그리고 무엇을 사용할지를 학습하는 시각-운동 정책입니다. GMP는 학습된 메모리 게이트 메커니즘을 사용하여 필요한 경우에만 과거 컨텍스트를 선택적으로 활성화하여 견고성과 반응성을 향상시킵니다. 또한, GMP는 효율적으로 메모리를 활용하기 위해 경량화된 크로스-어텐션 모듈을 도입하여 효과적인 잠재적 메모리 표현을 구성합니다. 더욱 견고성을 높이기 위해, GMP는 과거 행동에 디퓨전 노이즈를 주입하여 학습 및 추론 과정에서 노이즈가 있거나 부정확한 과거 정보에 대한 민감도를 완화합니다. 제안하는 비마르코프 벤치마크인 MemMimic에서, GMP는 긴 기록을 사용하는 기본 모델보다 평균적으로 30.1%의 성공률 향상을 달성했으며, RoboMimic에서 마르코프 특성을 가지는 작업에서도 경쟁력 있는 성능을 유지합니다. 모든 코드, 데이터 및 실제 환경 배포 지침은 프로젝트 웹사이트 https://gated-memory-policy.github.io/ 에서 확인할 수 있습니다.

Original Abstract

Robotic manipulation tasks exhibit varying memory requirements, ranging from Markovian tasks that require no memory to non-Markovian tasks that depend on historical information spanning single or multiple interaction trials. Surprisingly, simply extending observation histories of a visuomotor policy often leads to a significant performance drop due to distribution shift and overfitting. To address these issues, we propose Gated Memory Policy (GMP), a visuomotor policy that learns both when to recall memory and what to recall. To learn when to recall memory, GMP employs a learned memory gate mechanism that selectively activates history context only when necessary, improving robustness and reactivity. To learn what to recall efficiently, GMP introduces a lightweight cross-attention module that constructs effective latent memory representations. To further enhance robustness, GMP injects diffusion noise into historical actions, mitigating sensitivity to noisy or inaccurate histories during both training and inference. On our proposed non-Markovian benchmark MemMimic, GMP achieves a 30.1% average success rate improvement over long-history baselines, while maintaining competitive performance on Markovian tasks in RoboMimic. All code, data and in-the-wild deployment instructions are available on our project website https://gated-memory-policy.github.io/.

3 Citations

0 Influential

5.5 Altmetric

30.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!