2601.06966v1 Jan 11, 2026 cs.CL

RealMem: 실제 환경의 메모리 기반 상호작용에서 LLM 성능 평가

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Sen Hu

Citations: 135

h-index: 6

Ronghao Chen

Citations: 130

h-index: 6

Huacan Wang

Citations: 82

h-index: 5

Xueran Han

Citations: 18

h-index: 3

Zhiyuan Yao

Citations: 20

h-index: 3

Zishan Xu

Citations: 36

h-index: 4

Yifu Guo

Citations: 74

h-index: 3

Haonan Bian

Citations: 21

h-index: 2

Shaolei Zhang

Citations: 25

h-index: 3

Ziliang Yang

Citations: 15

h-index: 2

대규모 언어 모델(LLM)이 정적인 대화 인터페이스에서 자율적인 일반 에이전트로 발전함에 따라, 장기적인 일관성을 유지하는 데 효과적인 메모리는 매우 중요합니다. 그러나 기존 벤치마크는 주로 일상적인 대화 또는 작업 지향적인 대화에 초점을 맞추고 있으며, 에이전트가 변화하는 목표를 추적해야 하는 **"장기 프로젝트 지향"** 상호작용을 제대로 반영하지 못합니다. 이러한 격차를 해소하기 위해, 우리는 현실적인 프로젝트 시나리오를 기반으로 한 최초의 벤치마크인 **RealMem**을 소개합니다. RealMem은 11가지 시나리오에 걸쳐 2,000개 이상의 세션 간 대화로 구성되어 있으며, 자연스러운 사용자 쿼리를 사용하여 평가합니다. 우리는 프로젝트 기반 구축, 다중 에이전트 대화 생성, 메모리 및 일정 관리를 통합하는 합성 파이프라인을 제안하여 메모리의 동적인 발전을 시뮬레이션합니다. 실험 결과, 현재의 메모리 시스템은 실제 프로젝트에 내재된 장기적인 프로젝트 상태 및 동적인 컨텍스트 의존성을 관리하는 데 상당한 어려움을 겪는 것으로 나타났습니다. 저희의 코드와 데이터 세트는 [https://github.com/AvatarMemory/RealMemBench](https://github.com/AvatarMemory/RealMemBench)에서 이용하실 수 있습니다.

Original Abstract

As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture **"long-term project-oriented"** interactions where agents must track evolving goals. To bridge this gap, we introduce **RealMem**, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at [https://github.com/AvatarMemory/RealMemBench](https://github.com/AvatarMemory/RealMemBench).

8 Citations

0 Influential

41.317808230648 Altmetric

214.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!