2602.02369v1 Feb 02, 2026 cs.AI

Live-Evo: 지속적인 피드백을 통한 에이전트 메모리의 온라인 진화

Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback

Yiran Wu

Citations: 3,015

h-index: 9

Huazheng Wang

Citations: 477

h-index: 7

Yi Yu

Citations: 53

h-index: 4

Qingyun Wu

Citations: 259

h-index: 4

Yao Zhang

Citations: 135

h-index: 7

대규모 언어 모델(LLM) 에이전트는 작업 해결 성능을 향상시킬 수 있는 저장된 경험 및 재사용 가능한 지침인 메모리를 점점 더 많이 갖추고 있습니다. 최근의 자가 진화(self-evolving) 시스템은 상호작용 결과를 바탕으로 메모리를 업데이트하지만, 기존의 대부분의 진화 파이프라인은 정적인 학습/테스트 분할을 위해 개발되었으며 정적 벤치마크를 활용해 온라인 학습을 근사할 뿐이어서, 실제 데이터 분포의 변화와 지속적인 피드백 상황에서는 취약합니다. 이에 우리는 시간이 지남에 따라 유입되는 데이터 스트림으로부터 학습하는 온라인 자가 진화 메모리 시스템인 Live-Evo를 소개합니다. Live-Evo는 '경험 뱅크(Experience Bank)'와 '메타 가이드라인 뱅크(Meta-Guideline Bank)'를 통해 '무슨 일이 일어났는지'와 '그것을 어떻게 활용할지'를 분리하며, 각 작업에 대해 검색된 경험으로부터 작업 적응형 가이드라인을 구성합니다. 메모리를 온라인으로 관리하기 위해 Live-Evo는 경험 가중치를 유지하고 피드백을 통해 이를 업데이트합니다. 인간 기억의 강화 및 망각 과정과 유사하게, 지속적으로 도움이 되는 경험은 강화되어 더 자주 검색되는 반면, 오해를 불러일으키거나 효용이 떨어진 경험은 가중치가 낮아져 점차 잊혀집니다. 10주간 진행된 실시간 Prophet Arena 벤치마크에서 Live-Evo는 Brier 점수를 20.8% 개선하고 시장 수익률을 12.9% 증가시켰으며, 심층 연구 벤치마크에도 적용되어 강력한 베이스라인 모델들 대비 일관된 성능 향상을 보였습니다. 코드는 https://github.com/ag2ai/Live-Evo 에서 확인할 수 있습니다.

Original Abstract

Large language model (LLM) agents are increasingly equipped with memory, which are stored experience and reusable guidance that can improve task-solving performance. Recent \emph{self-evolving} systems update memory based on interaction outcomes, but most existing evolution pipelines are developed for static train/test splits and only approximate online learning by folding static benchmarks, making them brittle under true distribution shift and continuous feedback. We introduce \textsc{Live-Evo}, an online self-evolving memory system that learns from a stream of incoming data over time. \textsc{Live-Evo} decouples \emph{what happened} from \emph{how to use it} via an Experience Bank and a Meta-Guideline Bank, compiling task-adaptive guidelines from retrieved experiences for each task. To manage memory online, \textsc{Live-Evo} maintains experience weights and updates them from feedback: experiences that consistently help are reinforced and retrieved more often, while misleading or stale experiences are down-weighted and gradually forgotten, analogous to reinforcement and decay in human memory. On the live \textit{Prophet Arena} benchmark over a 10-week horizon, \textsc{Live-Evo} improves Brier score by 20.8\% and increases market returns by 12.9\%, while also transferring to deep-research benchmarks with consistent gains over strong baselines. Our code is available at https://github.com/ag2ai/Live-Evo.

3 Citations

0 Influential

24.5 Altmetric

125.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!