2604.15709v1 Apr 17, 2026 cs.AI

몬테카를로 트리 탐색을 이용한 에이전트 능력 최적화: 이중 최적화 접근 방식

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

Haoting Zhang

Citations: 51

h-index: 4

Jing Xu

Citations: 3

h-index: 1

Chen Huang

Citations: 322

h-index: 10

Zeyu Zheng

Citations: 34

h-index: 4

Yunduan Lin

Citations: 103

h-index: 3

에이전트 능력은 대규모 언어 모델(LLM) 에이전트가 특정 작업들을 수행하는 데 도움이 되는 지침, 도구 및 지원 리소스의 구조화된 모음입니다. 경험적 증거에 따르면 능력 설계는 에이전트의 작업 성능에 큰 영향을 미치지만, 능력을 체계적으로 최적화하는 것은 여전히 어려운 과제입니다. 능력은 지침, 도구 및 지원 리소스를 구조화된 방식으로 포함하므로, 능력을 최적화하려면 이러한 구성 요소의 구조와 각 구성 요소에 포함된 내용을 동시에 결정해야 합니다. 이는 구조와 구성 요소 간에 강한 상호 의존성이 있는 복잡한 의사 결정 공간을 야기합니다. 따라서 우리는 이러한 두 가지 밀접하게 관련된 의사 결정(능력 구조 및 구성 요소 내용)을 정의하고, 능력 최적화를 이중 최적화 문제로 공식화합니다. 우리는 외곽 루프에서 몬테카를로 트리 탐색을 사용하여 능력 구조를 결정하고, 내곽 루프에서 외곽 루프에 의해 선택된 구조 내에서 구성 요소 내용을 개선하는 이중 최적화 프레임워크를 제안합니다. 두 루프 모두 최적화 절차를 지원하기 위해 LLM을 활용합니다. 제안된 프레임워크를 오픈 소스 운영 연구 질의응답 데이터 세트에서 평가한 결과, 실험 결과는 최적화된 능력을 갖춘 에이전트의 성능이 이중 최적화 프레임워크를 통해 향상된다는 것을 시사합니다.

Original Abstract

Agent \texttt{skills} are structured collections of instructions, tools, and supporting resources that help large language model (LLM) agents perform particular classes of tasks. Empirical evidence shows that the design of \texttt{skills} can materially affect agent task performance, yet systematically optimizing \texttt{skills} remains challenging. Since a \texttt{skill} comprises instructions, tools, and supporting resources in a structured way, optimizing it requires jointly determining both the structure of these components and the content each component contains. This gives rise to a complex decision space with strong interdependence across structure and components. We therefore represent these two coupled decisions as \texttt{skill} structure and component content, and formulate \texttt{skill} optimization as a bilevel optimization problem. We propose a bilevel optimization framework in which an outer loop employs Monte Carlo Tree Search to determine the \texttt{skill} structure, while an inner loop refines the component content within the structure selected by the outer loop. In both loops, we employ LLMs to assist the optimization procedure. We evaluate the proposed framework on an open-source Operations Research Question Answering dataset, and the experimental results suggest that the bilevel optimization framework improves the performance of the agents with the optimized \texttt{skill}.

2 Citations

0 Influential

5 Altmetric

27.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!