2603.04900v1 Mar 05, 2026 cs.AI

EvoTool: LLM 에이전트의 자가 진화 도구 사용 정책 최적화: 비난 기반 돌연변이 및 다양성 기반 선택을 통한 방법

EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection

Soyeon Caren Han

Citations: 30

h-index: 3

Xueqi Ma

Citations: 23

h-index: 3

Yan Li

Citations: 82

h-index: 5

Mohammad Reza Ghasemi Madani

Citations: 175

h-index: 3

Eduard H. Hovy

Citations: 36

h-index: 2

Shuo Yang

Citations: 14

h-index: 1

LLM 기반 에이전트는 복잡한 작업을 해결하기 위해 효과적인 도구 사용 정책에 의존하지만, 지연된 감독과 장기적인 경로에서 보상 할당의 어려움으로 인해 이러한 정책을 최적화하는 것은 여전히 어려운 과제입니다. 기존 최적화 방법은 종종 단일화되어 있어 행동이 복잡하게 얽히거나, 특정 측면만 고려하여 모듈 간 오류 전파를 간과하는 경향이 있습니다. 이러한 한계점을 해결하기 위해, 우리는 모듈화된 도구 사용 정책을 경사 기반이 아닌 진화적 패러다임을 통해 최적화하는 자가 진화 프레임워크인 EvoTool을 제안합니다. EvoTool은 에이전트의 도구 사용 정책을 Planner, Selector, Caller, Synthesizer의 네 가지 모듈로 분해하고, 세 가지 새로운 메커니즘을 통해 반복적인 자기 개선 루프를 통해 각 모듈을 개선합니다. Trajectory-Grounded Blame Attribution은 진단 추적을 사용하여 오류를 특정 모듈로 연결합니다. Feedback-Guided Targeted Mutation은 자연어 비판을 통해 해당 모듈만 수정합니다. Diversity-Aware Population Selection은 보완적인 후보를 유지하여 솔루션의 다양성을 보장합니다. 네 가지 벤치마크에서 EvoTool은 GPT-4.1 및 Qwen3-8B 모두에서 강력한 기준 모델보다 5점 이상 우수한 성능을 보이며, 더 높은 효율성과 일반화 성능을 달성했습니다. 논문이 채택되면 코드가 공개될 예정입니다.

Original Abstract

LLM-based agents depend on effective tool-use policies to solve complex tasks, yet optimizing these policies remains challenging due to delayed supervision and the difficulty of credit assignment in long-horizon trajectories. Existing optimization approaches tend to be either monolithic, which are prone to entangling behaviors, or single-aspect, which ignore cross-module error propagation. To address these limitations, we propose EvoTool, a self-evolving framework that optimizes a modular tool-use policy via a gradient-free evolutionary paradigm. EvoTool decomposes agent's tool-use policy into four modules, including Planner, Selector, Caller, and Synthesizer, and iteratively improves them in a self-improving loop through three novel mechanisms. Trajectory-Grounded Blame Attribution uses diagnostic traces to localize failures to a specific module. Feedback-Guided Targeted Mutation then edits only that module via natural-language critique. Diversity-Aware Population Selection preserves complementary candidates to ensure solution diversity. Across four benchmarks, EvoTool outperforms strong baselines by over 5 points on both GPT-4.1 and Qwen3-8B, while achieving superior efficiency and transferability. The code will be released once paper is accepted.

13 Citations

1 Influential

2.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!