2606.05761v1 Jun 04, 2026 cs.AI

SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

Mingyang Song
Mingyang Song
Citations: 248
h-index: 5
Haoyu Sun
Haoyu Sun
Citations: 23
h-index: 2
Wenxuan Wang
Wenxuan Wang
Citations: 186
h-index: 5
Weinan Zhang
Weinan Zhang
Citations: 78
h-index: 3
Yu Cheng
Yu Cheng
Citations: 122
h-index: 5
Fukuan Hou
Fukuan Hou
Citations: 0
h-index: 0
Yang Yang
Yang Yang
Citations: 11
h-index: 2

Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated recall. Existing long-term memory benchmarks rarely probe how agents preserve and utilize such relations during downstream tasks. To address this gap, we introduce SubtleMemory, a benchmark for fine-grained relational memory discrimination in long-running AI agents. SubtleMemory constructs relation-controlled latent semantic artifacts whose variants instantiate complementary, nuanced, or contradictory relations, and embeds them into realistic user-agent histories, requiring agents to recover distributed relational structures during later queries and instructions. The benchmark contains 1,522 evaluation instances over 10 long histories, grounded in 1,090 relation-controlled memory-variant sets and spanning user-related and non-user-related queries. Evaluating six standalone memory systems, two Claw-style agents with native memory modules, and three Claw-style agents with plugin memory modules, we find that current systems remain weak on fine-grained relational memory discrimination. We further introduce diagnostic protocols that reveal distinct capability profiles across memory preservation, retrieval, and downstream reasoning stages.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!