2602.07839v1 Feb 08, 2026 cs.CL

TodoEvolve: 에이전트 계획 시스템 아키텍처 학습

TodoEvolve: Learning to Architect Agent Planning Systems

Zihan Zhang

Citations: 72

h-index: 6

Jiaxi Liu

Citations: 1

h-index: 1

Yan Jiang

Citations: 5

h-index: 1

Guibin Zhang

Citations: 4

h-index: 1

Heng Chang

Citations: 44

h-index: 1

Zhenfei Yin

Citations: 58

h-index: 4

Qibing Ren

Citations: 91

h-index: 4

Junchi Yan

Citations: 466

h-index: 6

계획 기능은 복잡하고 장기적인 작업을 수행하는 현대 에이전트 시스템의 핵심적인 능력으로 자리 잡았습니다. 그러나 기존의 접근 방식은 대부분 고정된, 사람이 직접 설계한 계획 구조에 의존하며, 이는 개방형 문제의 구조적 다양성에 적응할 수 있는 유연성이 부족합니다. 이러한 한계를 해결하기 위해, 우리는 작업별 계획 아키텍처를 자율적으로 합성하고 동적으로 수정하는 메타 계획 패러다임인 TodoEvolve를 소개합니다. 구체적으로, 우리는 다양한 계획 패러다임을 통합된 코드베이스 내에서 표준화하는 모듈형 설계 공간인 PlanFactory를 구축합니다. PlanFactory는 토폴로지, 초기화, 적응 및 탐색 기능을 포함하며, 이를 통해 다양한 계획 패턴에 대한 공통 인터페이스를 제공합니다. PlanFactory를 활용하여 고품질의 계획 경로를 수집하고, 임피던스 기반 선호도 최적화(IGPO)라는 다중 목표 강화 학습 목표를 사용하여 Todo-14B를 학습시킵니다. IGPO는 다양한 작업과 에이전트 아키텍처에서 성능이 우수하고 안정적이며 토큰 효율적인 계획 시스템을 생성하도록 장려합니다. 다섯 가지 에이전트 벤치마크에 대한 실험 결과는 TodoEvolve가 신중하게 설계된 계획 모듈보다 일관되게 우수한 성능을 보이며, 경제적인 API 비용과 런타임 오버헤드를 유지한다는 것을 보여줍니다.

Original Abstract

Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!