2603.23840v1 Mar 25, 2026 cs.AI

VehicleMemBench: 차량 내 에이전트의 다중 사용자 장기 기억을 위한 실행 가능한 벤치마크

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents

Shuochen Liu

Citations: 28

h-index: 3

Yuhao Chen

Citations: 14

h-index: 2

Tong Xu

Citations: 93

h-index: 3

Yi Xu

Citations: 27

h-index: 2

Xinyu Ding

Citations: 3

h-index: 1

Xiangzhong Fang

Citations: 17

h-index: 2

Luxi Lin

Citations: 16

h-index: 2

Qingyu Zhang

Citations: 18

h-index: 1

Ya Li

Citations: 1

h-index: 1

Quan Liu

Citations: 45

h-index: 4

지능적인 차량 내 경험에 대한 수요가 증가함에 따라, 차량 기반 에이전트는 단순한 보조 기능을 넘어 장기적인 동반자 역할을 수행하도록 진화하고 있습니다. 이러한 진화는 에이전트가 지속적으로 다중 사용자의 선호도를 모델링하고, 사용자 간 선호도 충돌과 시간 경과에 따른 변화에도 불구하고 신뢰할 수 있는 의사 결정을 내릴 수 있도록 요구합니다. 그러나 기존 벤치마크는 대부분 단일 사용자, 정적인 질의응답 환경에 국한되어 있으며, 실제 차량 환경의 시간적 선호도 변화와 다중 사용자, 도구 연동 특성을 제대로 반영하지 못합니다. 이러한 격차를 해소하기 위해, 우리는 실행 가능한 차량 시뮬레이션 환경을 기반으로 구축된 다중 사용자 장기 컨텍스트 기억 벤치마크인 VehicleMemBench를 소개합니다. 이 벤치마크는 수행 후 환경 상태를 미리 정의된 목표 상태와 비교하여 도구 사용 및 기억 능력을 평가하며, LLM 기반 또는 인간 평가 없이 객관적이고 재현 가능한 평가를 가능하게 합니다. VehicleMemBench는 23개의 도구 모듈을 포함하며, 각 샘플에는 80개 이상의 과거 기억 이벤트가 포함되어 있습니다. 실험 결과, 강력한 모델은 직접적인 지시 작업에서는 우수한 성능을 보이지만, 기억의 변화가 관련된 시나리오, 특히 사용자의 선호도가 동적으로 변화하는 경우 어려움을 겪는 것으로 나타났습니다. 심지어 고급 메모리 시스템조차도 이 환경에서 요구되는 도메인 특화 메모리 요구 사항을 처리하는 데 어려움을 겪습니다. 이러한 결과는 실제 차량 시스템에서 장기적인 적응적 의사 결정을 지원하기 위한 보다 강력하고 특화된 메모리 관리 메커니즘의 필요성을 강조합니다. 향후 연구를 지원하기 위해, 데이터와 코드를 공개합니다.

Original Abstract

With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple assistants to long-term companions. This evolution requires agents to continuously model multi-user preferences and make reliable decisions in the face of inter-user preference conflicts and changing habits over time. However, existing benchmarks are largely limited to single-user, static question-answer settings, failing to capture the temporal evolution of preferences and the multi-user, tool-interactive nature of real vehicle environments. To address this gap, we introduce VehicleMemBench, a multi-user long-context memory benchmark built on an executable in-vehicle simulation environment. The benchmark evaluates tool use and memory by comparing the post-action environment state with a predefined target state, enabling objective and reproducible evaluation without LLM-based or human scoring. VehicleMemBench includes 23 tool modules, and each sample contains over 80 historical memory events. Experiments show that powerful models perform well on direct instruction tasks but struggle in scenarios involving memory evolution, particularly when user preferences change dynamically. Even advanced memory systems struggle to handle domain-specific memory requirements in this environment. These findings highlight the need for more robust and specialized memory management mechanisms to support long-term adaptive decision-making in real-world in-vehicle systems. To facilitate future research, we release the data and code.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!