2601.22758v1 Jan 30, 2026 cs.AI

AutoRefine: 지속적인 LLM 에이전트 개선을 위해 궤적을 재사용 가능한 전문 지식으로 전환

AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement

Libin Qiu

Citations: 25

h-index: 3

Junfu Chen

Citations: 16

h-index: 2

Weizhi Huang

Citations: 38

h-index: 4

Wenkai Qiu

Citations: 16

h-index: 2

Zhirong Gao

Citations: 41

h-index: 3

Xiaobo Xue

Citations: 16

h-index: 2

Yuhang Ye

Citations: 16

h-index: 2

Shuo Tang

Citations: 342

h-index: 11

대규모 언어 모델(LLM) 에이전트는 종종 각 작업을 독립적인 도전 과제로 인식하여 경험으로부터 지식을 축적하지 못하는 경우가 많습니다. 최근의 연구들은 경험을 평면적인 텍스트 지식으로 추출하지만, 이는 복잡한 하위 작업의 절차적 논리를 포착하지 못합니다. 또한 유지 관리 메커니즘의 부재로 인해 경험이 축적될수록 저장소의 품질이 저하되는 문제가 발생합니다. 이에 우리는 에이전트의 실행 기록으로부터 이중 형태의 '경험 패턴(Experience Patterns)'을 추출하고 관리하는 프레임워크인 AutoRefine을 제안합니다. 절차적 하위 작업에 대해서는 독립적인 추론과 메모리를 갖춘 전문화된 하위 에이전트를 추출하고, 정적 지식에 대해서는 가이드라인이나 코드 스니펫 형태의 기술 패턴을 추출합니다. 또한 지속적인 유지 관리 메커니즘을 통해 패턴을 점수화, 가지치기(pruning), 병합함으로써 저장소 품질 저하를 방지합니다. ALFWorld, ScienceWorld, TravelPlanner에서 평가한 결과, AutoRefine은 각각 98.4%, 70.4%, 27.1%의 성공률을 달성했으며, 실행 단계 수를 20~73% 단축시켰습니다. 특히 TravelPlanner에서는 자동 추출 방식이 수동으로 설계된 시스템의 성능(27.1% 대 12.1%)을 능가하여, 절차적 조정 능력을 효과적으로 포착함을 입증했습니다.

Original Abstract

Large language model agents often fail to accumulate knowledge from experience, treating each task as an independent challenge. Recent methods extract experience as flattened textual knowledge, which cannot capture procedural logic of complex subtasks. They also lack maintenance mechanisms, causing repository degradation as experience accumulates. We introduce AutoRefine, a framework that extracts and maintains dual-form Experience Patterns from agent execution histories. For procedural subtasks, we extract specialized subagents with independent reasoning and memory. For static knowledge, we extract skill patterns as guidelines or code snippets. A continuous maintenance mechanism scores, prunes, and merges patterns to prevent repository degradation. Evaluated on ALFWorld, ScienceWorld, and TravelPlanner, AutoRefine achieves 98.4%, 70.4%, and 27.1% respectively, with 20-73% step reductions. On TravelPlanner, automatic extraction exceeds manually designed systems (27.1% vs 12.1%), demonstrating its ability to capture procedural coordination.

10 Citations

1 Influential

5.5 Altmetric

39.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!