2603.18743v1 Mar 19, 2026 cs.AI

Memento-Skills: 에이전트가 에이전트를 설계하도록 하다

Memento-Skills: Let Agents Design Agents

Yihang Chen

Citations: 215

h-index: 3

Xinle Yu

Citations: 58

h-index: 3

Huichi Zhou

Citations: 40

h-index: 3

Siyuan Guo

Jilin Univerisity

Citations: 355

h-index: 9

Zhongwei Yu

Citations: 33

h-index: 2

Ziqin Gong

Citations: 119

h-index: 4

Bowen Zhao

Citations: 32

h-index: 2

Zhixun Chen

Citations: 60

h-index: 3

Jinsong Li

Citations: 34

h-index: 2

Runyu Yang

Citations: 117

h-index: 6

Qiang Liu

Citations: 38

h-index: 3

Jianming Zhou

Citations: 31

h-index: 1

Chunyang Sun

Citations: 38

h-index: 2

Jun Wang

Citations: 106

h-index: 3

Anji Liu

Citations: 69

h-index: 5

Menglong Zhang

Citations: 31

h-index: 1

Na Wang

Citations: 35

h-index: 2

본 논문에서는 extit{Memento-Skills}를 소개합니다. 이는 일반적인 능력을 갖추고 지속적으로 학습 가능한 LLM 에이전트 시스템으로, extit{에이전트 설계 에이전트}로서 작동합니다. 이 시스템은 경험을 통해 특정 작업에 맞는 에이전트를 자율적으로 구축, 적응 및 개선합니다. 시스템은 extit{상태 정보를 포함하는 프롬프트}를 사용하는 메모리 기반 강화 학습 프레임워크로 구축되었으며, 재사용 가능한 기술(구조화된 마크다운 파일로 저장됨)은 지속적이고 진화하는 메모리 역할을 합니다. 이러한 기술은 행동과 맥락을 모두 인코딩하여 에이전트가 상호 작용을 통해 지식을 전달할 수 있도록 합니다. 단순한 기본 기술(예: 웹 검색 및 터미널 작업)부터 시작하여, 에이전트는 extit{Memento~2}~ ocite{wang2025memento2}에서 소개된 extit{읽기-쓰기-성찰 학습} 메커니즘을 통해 지속적으로 개선됩니다. extit{읽기} 단계에서는 행동 훈련이 가능한 기술 라우터가 현재의 상태 정보를 포함하는 프롬프트에 따라 가장 관련성이 높은 기술을 선택합니다. extit{쓰기} 단계에서는 에이전트가 새로운 경험을 기반으로 기술 라이브러리를 업데이트하고 확장합니다. 이러한 폐쇄 루프 설계는 LLM 파라미터를 업데이트하지 않고도 extit{외부화된 기술 및 프롬프트의 진화를 통해 지속적인 학습을 가능하게} 합니다. 기존의 인간이 설계한 에이전트에 의존하는 접근 방식과 달리, Memento-Skills는 일반적인 에이전트가 새로운 작업에 대해 extit{종단 간(end-to-end) 에이전트를 설계}할 수 있도록 합니다. 반복적인 기술 생성 및 개선을 통해 시스템은 자체 기능을 점진적으로 향상시킵니다. extit{General AI Assistants} 벤치마크 및 extit{Humanity's Last Exam}에 대한 실험 결과, 각각 전반적인 정확도에서 26.2% 및 116.2%의 상대적인 성능 향상을 보였습니다. 코드 및 관련 자료는 https://github.com/Memento-Teams/Memento-Skills 에서 확인할 수 있습니다.

Original Abstract

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with \emph{stateful prompts}, where reusable skills (stored as structured markdown files) serve as persistent, evolving memory. These skills encode both behaviour and context, enabling the agent to carry forward knowledge across interactions. Starting from simple elementary skills (like Web search and terminal operations), the agent continually improves via the \emph{Read--Write Reflective Learning} mechanism introduced in \emph{Memento~2}~\cite{wang2025memento2}. In the \emph{read} phase, a behaviour-trainable skill router selects the most relevant skill conditioned on the current stateful prompt; in the \emph{write} phase, the agent updates and expands its skill library based on new experience. This closed-loop design enables \emph{continual learning without updating LLM parameters}, as all adaptation is realised through the evolution of externalised skills and prompts. Unlike prior approaches that rely on human-designed agents, Memento-Skills enables a generalist agent to \emph{design agents end-to-end} for new tasks. Through iterative skill generation and refinement, the system progressively improves its own capabilities. Experiments on the \emph{General AI Assistants} benchmark and \emph{Humanity's Last Exam} demonstrate sustained gains, achieving 26.2\% and 116.2\% relative improvements in overall accuracy, respectively. Code is available at https://github.com/Memento-Teams/Memento-Skills.

29 Citations

5 Influential

43.420948169591 Altmetric

256.1 Score

Original PDF

AI Analysis

Korean Summary

Memento-Skills는 파라미터 업데이트 없이(Frozen LLM) 에이전트가 자율적으로 태스크별 스킬(에이전트)을 생성, 수정, 발전시키는 지속 학습형 범용 LLM 에이전트 시스템입니다. 외부 메모리인 '스킬 라이브러리'를 활용한 '읽기-쓰기 성찰 학습(Read-Write Reflective Learning)' 루프를 통해 경험으로부터 행동 및 문맥을 학습하며, GAIA 및 HLE 벤치마크에서 기존 정적 모델 대비 압도적인 성능 향상과 훌륭한 교차 태스크 전이 능력을 입증했습니다.

Key Innovations

파라미터 업데이트 없는 지속 학습: LLM 가중치를 고정한 채 스킬 메모리(실행 가능한 코드 및 프롬프트 파일)를 읽고 쓰는 구조를 도입하여 컴퓨팅 비용 없이 에이전트를 지속적으로 발전시킴
스킬 단위의 성찰적 진화(Self-Evolving): 태스크 수행 실패 시 로그를 분석하여 기존 스킬 코드를 수정(Patch)하거나 완전히 새로운 스킬을 생성 및 자동 테스트하여 라이브러리를 동적으로 확장함
행동 정렬 스킬 라우터(Behavior-aligned Router): 단순 텍스트 의미 유사도가 아닌 실제 작업 실행의 성공 가능성을 예측하도록 단일 스텝 오프라인 강화학습(InfoNCE)으로 훈련된 검색 모델 적용
SRDP(Stateful Reflective Decision Process)의 실제 구현: 에피소드 메모리를 재사용 가능한 스킬 단위로 취급함으로써, 진화하는 시스템 내에서도 마르코프(Markovian) 특성을 유지하며 수학적 수렴성을 보장함

Learning & Inference Impact

학습 과정(Learning)에서는 가중치 최적화를 위한 비용이 큰 역전파 과정을 완전히 배제하고, 대신 시스템의 경험을 바탕으로 외부 스킬 라이브러리의 파일(프롬프트, 코드 등)을 갱신하는 비모수적(non-parametric) 방식을 사용합니다. 이는 대규모 연산 자원 없이도 지속적인 능력을 향상시키며 파국적 망각을 방지합니다. 추론 과정(Inference)에서는 행동 정렬 라우터가 입력 태스크에 대해 가장 성공 확률이 높은 맞춤형 스킬(워크플로우 및 문맥)을 검색해냅니다. 고정된 LLM은 검색된 스킬에 명시된 지시를 그대로 따르기만 하면 되므로, 시스템이 여러 번의 학습 루프를 거쳐 스킬 라이브러리가 정교해질수록 추론 단계의 문제 해결 정확도와 성공률이 비례하여 높아지는 긍정적인 피드백 루프를 형성합니다.

Technical Difficulty

고급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!