2605.00356v1 May 01, 2026 cs.CL

MemRouter: 임베딩 기반 라우팅을 통한 장기 대화형 에이전트의 메모리 관리

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

Song Wang

Citations: 9

h-index: 1

Jing Ma

Citations: 21

h-index: 2

Tianyu Hu

Citations: 71

h-index: 4

Wei Lin

Citations: 84

h-index: 2

Weizhi Zhang

Citations: 152

h-index: 2

장기 대화형 에이전트는 외부 메모리에 저장할 대화 내용을 결정해야 하지만, 최근 시스템들은 각 단계에서 autoregressive LLM(Large Language Model)을 사용하여 이러한 결정을 내립니다. 본 연구에서는 MemRouter를 제안합니다. MemRouter는 메모리 저장 여부를 결정하는 방식을 답변 생성 메커니즘과 분리하고, 각 단계별 메모리 관리 디코딩을 임베딩 기반 라우팅 정책으로 대체하는 쓰기 측면 메모리 라우터입니다. MemRouter는 각 대화 내용을 최근 컨텍스트와 함께 인코딩하고, 결과 임베딩을 고정된 LLM 백본을 통해 투영하여, 경량화된 분류 헤드를 사용하여 각 대화 내용을 저장할지 예측합니다. 이 과정에서 총 12M개의 파라미터만 학습됩니다. LoCoMo 데이터셋에 대한 통제된 비교 실험에서, MemRouter는 LLM 기반 메모리 관리 시스템보다 모든 질문 유형에서 더 우수한 성능을 보였습니다(전체 F1 점수 52.0 vs 45.6, 95% 신뢰 구간에서 중첩되지 않음). 또한, MemRouter는 메모리 관리의 평균 지연 시간을 970ms에서 58ms로 줄였습니다. 추가적인 분석 결과, 학습된 메모리 저장 정책은 무작위 저장 방식보다 평균 F1 점수를 +10.3만큼 향상시키고, 질문 유형에 특화된 프롬프트는 일반적인 프롬프트보다 +5.2만큼 향상시키며, 정보 검색은 +0.7만큼의 성능 향상을 가져왔습니다. 이러한 결과는 쓰기 측면의 메모리 저장 여부를 작은 규모의 지도 학습 기반 라우터가 학습할 수 있으며, 답변 생성은 별도의 하위 구성 요소로 유지될 수 있음을 시사합니다.

Original Abstract

Long-term conversational agents must decide which turns to store in external memory, yet recent systems rely on autoregressive LLM generation at every turn to make that decision. We present MemRouter, a write-side memory router that decouples memory admission from the downstream answer backbone and replaces per-turn memory-management decoding with an embedding-based routing policy. MemRouter encodes each turn together with recent context, projects the resulting embeddings through a frozen LLM backbone, and predicts whether the turn should be stored using lightweight classification heads while training only 12M parameters. Under a controlled matched-harness comparison on LoCoMo, where the retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B) are held identical, MemRouter outperforms an LLM-based memory manager on every question category (overall F1 52.0 vs 45.6, non-overlapping 95% CIs) while reducing memory-management p50 latency from 970ms to 58ms. Descriptive factorial averaging further shows that learned admission improves mean F1 by +10.3 over random storage, category-specific prompting adds +5.2 over a generic prompt, and retrieval contributes +0.7. These results suggest that write-side memory admission can be learned by a small supervised router, while answer generation remains a separate downstream component in long-horizon conversational QA.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!