2603.18620v1 Mar 19, 2026 cs.CL

자기 진화 학습

Learning to Self-Evolve

Canwen Xu

Citations: 92

h-index: 4

Boyi Liu

Citations: 23

h-index: 2

Yite Wang

Citations: 57

h-index: 2

Yuxiong He

Citations: 1,644

h-index: 19

Zhewei Yao

Citations: 124

h-index: 5

Xiaoyin Chen

Citations: 17

h-index: 2

본 논문에서는 Learning to Self-Evolve (LSE)라는 강화 학습 프레임워크를 소개합니다. LSE는 대규모 언어 모델(LLM)이 테스트 시점에 자체 컨텍스트를 개선하도록 훈련하는 방법입니다. LSE는 모델이 이미 학습된 문제에 대한 피드백을 기반으로 컨텍스트를 반복적으로 개선하여 새로운 문제에 대한 성능을 향상시키는 '테스트 시점 자기 진화' 환경에서 작동합니다. 기존 방법은 모델의 내재적인 추론 능력에만 의존하며, 이 특정 작업을 위해 명시적으로 훈련하지 않습니다. LSE는 다단계 진화 문제를 단일 단계의 강화 학습 목표로 축소하며, 각 컨텍스트 수정은 하위 작업의 성능 향상에 따라 보상을 받습니다. 우리는 이 목표를 트리 기반 진화 루프와 결합했습니다. Text-to-SQL 생성(BIRD) 및 일반적인 질문 응답(MMLU-Redux) 작업에서, LSE로 훈련된 40억 개의 파라미터를 가진 모델은 GPT-5 및 Claude Sonnet 4.5에 의해 구동되는 자기 진화 정책, 그리고 GEPA 및 TextGrad와 같은 프롬프트 최적화 방법보다 뛰어난 성능을 보였습니다. 또한, LSE는 추가 훈련 없이 다른 모델을 안내하는 데 사용될 수 있습니다. 이러한 결과는 자기 진화를 학습 가능한 기술로 취급하는 것이 효과적임을 보여줍니다.

Original Abstract

We introduce Learning to Self-Evolve (LSE), a reinforcement learning framework that trains large language models (LLMs) to improve their own contexts at test time. We situate LSE in the setting of test-time self-evolution, where a model iteratively refines its context from feedback on seen problems to perform better on new ones. Existing approaches rely entirely on the inherent reasoning ability of the model and never explicitly train it for this task. LSE reduces the multi-step evolution problem to a single-step RL objective, where each context edit is rewarded by the improvement in downstream performance. We pair this objective with a tree-guided evolution loop. On Text-to-SQL generation (BIRD) and general question answering (MMLU-Redux), a 4B-parameter model trained with LSE outperforms self-evolving policies powered by GPT-5 and Claude Sonnet 4.5, as well as prompt optimization methods including GEPA and TextGrad, and transfers to guide other models without additional training. Our results highlight the effectiveness of treating self-evolution as a learnable skill.

1 Citations

0 Influential

9.5 Altmetric

48.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!