2601.04888v1 Jan 08, 2026 cs.AI

SmartSearch: 검색 에이전트를 위한 과정 보상 기반 질의 정제

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

Tongyu Wen

Citations: 29

h-index: 2

Guanting Dong

Citations: 1,152

h-index: 13

Zhicheng Dou

Citations: 2,389

h-index: 24

대규모 언어 모델(LLM) 기반 검색 에이전트는 정보 검색 기능을 통합하여 지식 집약적 문제를 해결하는 데 유망함을 입증했습니다. 기존 연구들은 주로 검색 에이전트의 추론 패러다임 최적화에 집중한 반면, 추론 과정에서의 중간 검색 질의(query) 품질은 간과해 왔습니다. 이로 인해 생성된 질의가 부정확한 경우가 많아 예상치 못한 검색 결과를 초래하고, 결국 검색 에이전트의 전반적인 효과를 제한합니다. 이러한 문제를 해결하기 위해, 우리는 두 가지 핵심 메커니즘을 기반으로 하는 프레임워크인 SmartSearch를 제안합니다. (1) 이중 수준 신용 평가(Dual-Level Credit Assessment)를 통해 각 중간 검색 질의의 품질에 대해 세밀한 감독을 제공하는 과정 보상(Process rewards). (2) 품질이 낮은 검색 질의를 선별적으로 정제하고 이를 바탕으로 후속 검색 과정을 재생성하여 질의 생성 최적화를 도모하는 질의 정제(Query refinement). 또한, 과정 보상의 지도 하에 검색 에이전트가 질의 품질 개선 능력을 점진적으로 습득할 수 있도록 3단계 커리큘럼 학습 프레임워크를 설계했습니다. 이 프레임워크는 에이전트가 모방에서 정렬, 그리고 일반화로 나아가도록 안내합니다. 실험 결과, SmartSearch는 기존 베이스라인을 일관되게 상회하였으며, 추가 정량 분석을 통해 검색 효율성과 질의 품질 모두에서 유의미한 향상을 확인했습니다. 코드는 https://github.com/MYVAE/SmartSearch 에서 공개되어 있습니다.

Original Abstract

Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.

6 Citations

0 Influential

51.033312448852 Altmetric

261.2 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!