2604.04503v1 Apr 06, 2026 cs.AI

메모리 인텔리전스 에이전트

Memory Intelligence Agent

Kun Shao

Citations: 31

h-index: 2

Xin Tan

Citations: 765

h-index: 12

Yuan Xie

Citations: 28

h-index: 3

Zhizhong Zhang

Citations: 281

h-index: 8

Jingyang Qiao

Citations: 68

h-index: 4

Weicheng Meng

Citations: 1

h-index: 1

Zhihang Lin

Citations: 23

h-index: 3

Jingyu Gong

Citations: 3

h-index: 1

Yu Cheng

Citations: 60

h-index: 4

심층 연구 에이전트(DRA)는 LLM 추론과 외부 도구를 통합합니다. 메모리 시스템은 DRA가 과거 경험을 활용할 수 있도록 하며, 이는 효율적인 추론과 자율적인 발전에서 필수적입니다. 기존 방법은 추론을 돕기 위해 메모리에서 유사한 경로를 검색하는 데 의존하지만, 비효율적인 메모리 발전 및 증가하는 저장 및 검색 비용이라는 주요 한계를 가지고 있습니다. 이러한 문제를 해결하기 위해, 우리는 관리자-계획자-실행자 아키텍처로 구성된 새로운 메모리 인텔리전스 에이전트(MIA) 프레임워크를 제안합니다. 메모리 관리자는 압축된 과거 검색 경로를 저장할 수 있는 비매개변수 메모리 시스템입니다. 계획자는 질문에 대한 검색 계획을 생성할 수 있는 매개변수 메모리 에이전트입니다. 실행자는 검색 계획에 따라 정보를 검색하고 분석할 수 있는 또 다른 에이전트입니다. MIA 프레임워크를 구축하기 위해, 우리는 먼저 계획자와 실행자 간의 협력을 향상시키기 위해 교차 강화 학습 패러다임을 채택했습니다. 또한, 계획자가 추론 과정의 중단을 방지하면서 테스트 시간 학습 동안 지속적으로 발전할 수 있도록 합니다. 게다가, 효율적인 메모리 발전을 달성하기 위해 매개변수 및 비매개변수 메모리 간의 양방향 변환 루프를 구축합니다. 마지막으로, 개방형 환경에서 추론과 자기 발전을 향상시키기 위해 반성 및 비지도 판단 메커니즘을 통합합니다. 열한 가지 벤치마크에 대한 광범위한 실험 결과는 MIA의 우수성을 입증합니다.

Original Abstract

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!