2603.13131v1 Mar 13, 2026 cs.AI

Steve-Evolving: 미세 조정된 진단 및 이중 경로 지식 증류를 통한 개방형 환경의 자기 진화 에이전트

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

Jingwei Song

Citations: 12

h-index: 1

Ziyan Weng

Citations: 9

h-index: 2

Chenglong Li

Citations: 5

h-index: 1

Zikai Xiao

Citations: 110

h-index: 7

Zhen Xie

Citations: 51

h-index: 4

Vireo Zhang

Citations: 0

h-index: 0

Kun Wang

Citations: 114

h-index: 5

Zhisheng Chen

Citations: 68

h-index: 4

Jinhan Li

Citations: 56

h-index: 2

Jinhao Jing

Citations: 0

h-index: 0

개방형 환경에서 작동하는 에이전트는 장기적인 과제를 해결해야 하며, 이때 주요 병목 현상은 단일 단계 계획의 품질이 아니라 상호 작용 경험이 어떻게 조직되고 진화하는가이다. 이러한 문제를 해결하기 위해, 우리는 미세 조정된 실행 진단을 이중 경로 지식 증류와 결합하는 비모수적 자기 진화 프레임워크인 Steve-Evolving을 제안한다. 이 방법은 세 가지 단계로 구성된다: 경험 고정(Experience Anchoring), 경험 증류(Experience Distillation), 그리고 지식 기반 폐루프 제어(Knowledge-Driven Closed-Loop Control). 구체적으로, 경험 고정 단계에서는 각 하위 목표 시도의 결과를 고정된 스키마(사전 상태, 행동, 진단 결과, 사후 상태)를 가진 구조화된 경험 튜플로 변환하고, 다차원 인덱스(예: 조건 서명, 공간 해싱, 의미 태그)를 사용한 3단계 경험 공간에 저장하며, 효율적이고 감사 가능한 검색을 위해 지속적인 요약 기능을 제공한다. 추론 가능성을 높이기 위해, 실행 계층은 이진 결과 외에 상태 차이 요약, 열거된 실패 원인, 연속적인 지표, 정체/루프 감지 등 세분화된 진단 신호를 제공한다. 또한, 경험 증류 단계에서는 성공적인 경로를 명시적인 전제 조건과 검증 기준을 가진 재사용 가능한 기술로 일반화하고, 실패는 근본 원인을 포착하고 하위 목표 및 전체 작업 수준에서 위험한 작업을 방지하는 실행 가능한 가이드레일로 증류한다. 게다가, 지식 기반 폐루프 제어 단계에서는 검색된 기술과 가이드레일을 LLM 플래너에 주입하고, 진단에 의해 트리거되는 로컬 재계획은 활성 제약을 실시간으로 업데이트하여 모델 파라미터 업데이트 없이 지속적인 진화 프로세스를 형성한다. Minecraft MCU의 장기 과제 세트에서 실험한 결과, 제안하는 방법은 기존의 정적 검색 기반 방법보다 일관되게 성능이 향상되었다.

Original Abstract

Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and Knowledge-Driven Closed-Loop Control. In detail, Experience Anchoring solidifies each subgoal attempt into a structured experience tuple with a fixed schema (pre-state, action, diagnosis-result, and post-state) and organizes it in a three-tier experience space with multi-dimensional indices (e.g., condition signatures, spatial hashing, and semantic tags) plus rolling summarization for efficient and auditable recall. To ensure sufficient information density for attribution, the execution layer provides compositional diagnosis signals beyond binary outcomes, including state-difference summaries, enumerated failure causes, continuous indicators, and stagnation/loop detection. Moreover, successful trajectories of Experience Distillation are generalized into reusable skills with explicit preconditions and verification criteria, while failures are distilled into executable guardrails that capture root causes and forbid risky operations at both subgoal and task granularities. Besides, Knowledge-Driven Closed-Loop Control retrieved skills and guardrails are injected into an LLM planner, and diagnosis-triggered local replanning updates the active constraints online, forming a continual evolution process without any model parameter updates. Experiments on the long-horizon suite of Minecraft MCU demonstrate consistent improvements over static-retrieval baselines.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!