2602.04248v1 Feb 04, 2026 cs.AI

Empirical-MCTS: 이중 경험 몬테카를로 트리 탐색을 통한 연속적 에이전트 진화

Empirical-MCTS: Continuous Agent Evolution via Dual-Experience Monte Carlo Tree Search

Hao Lu

Citations: 237

h-index: 3

Ningxin Zhu

Citations: 7

h-index: 2

Haoyuan Huang

Citations: 7

h-index: 2

Yulin Zhou

Citations: 72

h-index: 4

Chen Li

Citations: 6

h-index: 2

추론 시점 확장 전략, 특히 몬테카를로 트리 탐색(MCTS)은 대규모 언어 모델(LLM)의 추론 능력을 획기적으로 향상시켰습니다. 그러나 현재의 접근 방식들은 대부분 상태 비저장(stateless) 방식에 머물러 있어, 각 문제 해결 후 성공적인 추론 패턴을 폐기하며, 인간 문제 해결의 특징인 지혜의 경험적 축적을 모방하지 못하고 있습니다. 이러한 간극을 메우기 위해 우리는 상태 비저장 탐색을 연속적인 비모수적 학습 과정으로 전환하는 이중 루프 프레임워크인 Empirical-MCTS를 제안합니다. 이 프레임워크는 '쌍별 경험 진화 메타 프롬프팅(PE-EMP)'과 '메모리 최적화 에이전트'라는 두 가지 새로운 메커니즘을 통해 국소적 탐색과 전역적 메모리 최적화를 통합합니다. PE-EMP는 국소 탐색 내에서 성찰적 최적화기로 작동하여, 쌍별 피드백을 활용해 실시간으로 적응형 기준을 동적으로 합성하고 메타 프롬프트(시스템 프롬프트)를 진화시킵니다. 동시에 메모리 최적화 에이전트는 원자적 연산을 사용하여 여러 문제에 걸쳐 고품질의 통찰력을 정제함으로써, 전역 저장소를 동적 정책 사전(prior)으로 관리합니다. AIME25, ARC-AGI-2, MathArena Apex 등 복잡한 추론 벤치마크에 대한 광범위한 평가 결과, Empirical-MCTS는 상태 비저장 MCTS 전략과 단독 경험 기반 에이전트 모두를 크게 능가하는 것으로 나타났습니다. 이러한 결과는 복잡하고 개방형인 추론 과제를 숙달하기 위해서는 구조화된 탐색과 경험적 축적의 결합이 필수적임을 강조합니다.

Original Abstract

Inference-time scaling strategies, particularly Monte Carlo Tree Search (MCTS), have significantly enhanced the reasoning capabilities of Large Language Models (LLMs). However, current approaches remain predominantly stateless, discarding successful reasoning patterns after each problem instance and failing to mimic the empirical accumulation of wisdom characteristic of human problem-solving. To bridge this gap, we introduce Empirical-MCTS, a dual-loop framework that transforms stateless search into a continuous, non-parametric learning process. The framework unifies local exploration with global memory optimization through two novel mechanisms: Pairwise-Experience-Evolutionary Meta-Prompting (PE-EMP) and a Memory Optimization Agent. PE-EMP functions as a reflexive optimizer within the local search, utilizing pairwise feedback to dynamically synthesize adaptive criteria and evolve meta-prompts (system prompts) in real-time. Simultaneously, the Memory Optimization Agent manages a global repository as a dynamic policy prior, employing atomic operations to distill high-quality insights across problems. Extensive evaluations on complex reasoning benchmarks, including AIME25, ARC-AGI-2, and MathArena Apex, demonstrate that Empirical-MCTS significantly outperforms both stateless MCTS strategies and standalone experience-driven agents. These results underscore the critical necessity of coupling structured search with empirical accumulation for mastering complex, open-ended reasoning tasks.

2 Citations

0 Influential

2 Altmetric

12.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!