2604.08124v1 Apr 09, 2026 cs.AI

확률적 탐색을 넘어서: 에이전트 검색에 있어 학습 데이터의 가치

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

Guochao Jiang

Citations: 187

h-index: 8

Guohua Liu

Citations: 22

h-index: 1

Yuewei Zhang

Citations: 95

h-index: 5

Chuzhan Hao

Citations: 94

h-index: 5

Wenfeng Feng

Citations: 83

h-index: 5

Guofeng Quan

Citations: 50

h-index: 3

강화 학습(RL)은 외부 검색 엔진을 전략적으로 통합하여 대규모 언어 모델(LLM)의 추론 능력을 향상시키는 효과적인 접근 방식이 되었습니다. 그러나 현재의 RL 기반 검색 에이전트는 종종 신중하게 설계된 결과 보상에 의해 안내되는 확률적 탐색 프로세스에 의존하는데, 이는 비효율적인 추론 경로와 불안정한 학습으로 이어집니다. 이러한 문제를 해결하기 위해, 우리는 검색 에이전트의 성능과 학습 안정성을 향상시키는 새로운 프레임워크인 계층적 경험(HiExp)을 제안합니다. 구체적으로, 우리는 대조 분석과 다단계 클러스터링 메커니즘을 통해 실제 지식을 추출하고, 원시 추론 경로를 계층적 경험 지식으로 변환합니다. 경험에 기반한 학습을 활용하여, 우리는 확률적 탐색을 효과적으로 규제하고, 이를 전략적이고 경험 중심적인 검색 프로세스로 발전시킵니다. 여러 복잡한 에이전트 검색 및 수학적 추론 벤치마크에 대한 광범위한 평가는 우리의 접근 방식이 상당한 성능 향상을 달성할 뿐만 아니라, 강력한 교차 작업 및 교차 알고리즘 일반화 능력을 보여준다는 것을 입증합니다.

Original Abstract

Reinforcement learning (RL) has become an effective approach for advancing the reasoning capabilities of large language models (LLMs) through the strategic integration of external search engines. However, current RL-based search agents often rely on a process of stochastic exploration guided by carefully crafted outcome rewards, leading to inefficient reasoning trajectories and unstable training. To address these issues, we propose a novel framework, Hierarchical Experience (HiExp), to enhance the performance and training stability of search agents. Specifically, we extract empirical knowledge through contrastive analysis and a multi-level clustering mechanism, transforming raw reasoning trajectories into hierarchical experience knowledge. By leveraging experience-aligned training, we effectively regularize stochastic exploration, evolving it into a strategic and experience-driven search process. Extensive evaluations on multiple complex agentic search and mathematical reasoning benchmarks demonstrate that our approach not only achieves substantial performance gains but also exhibits strong cross-task and cross-algorithm generalization.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!