2605.28354v1 May 27, 2026 cs.AI

Plan Before Search: Search Agents Need Plan

Qibin Hou
Qibin Hou
Citations: 51
h-index: 4
Zhipeng Qian
Zhipeng Qian
Citations: 85
h-index: 4
Zihan Liang
Zihan Liang
Citations: 65
h-index: 4
Yufei Ma
Yufei Ma
Citations: 101
h-index: 5
Ben Chen
Ben Chen
Citations: 70
h-index: 4
Huangyu Dai
Huangyu Dai
Citations: 81
h-index: 5
Jiayi Ji
Jiayi Ji
Citations: 983
h-index: 16
Chenyi Lei
Chenyi Lei
Citations: 169
h-index: 5
Xiaoshuai Sun
Xiaoshuai Sun
Citations: 1,235
h-index: 20
Wenwu Ou
Wenwu Ou
Citations: 129
h-index: 3

Training large language models as retrieval-augmented reasoning agents typically combines reinforcement learning with an SFT cold start distilled from a stronger model. However, this paradigm overlooks two fundamental factors: the dependency structure among sub-skills, and the possibility that distillation is not the only route to capability acquisition. We study this through Plan, a structured agentic behavior for multi-hop retrieval that decomposes a question into ordered sub-questions before any retrieval is performed, so that each search step can be anchored to a pre-designed sub-question instead of drifting under the influence of partially relevant documents retrieved earlier. However, across three model families spanning 3B to 14B parameters, we find that an identical reward signal induces qualitatively different RL failure modes. This phenomenon indicates that successful training hinges not only on reward design but also on model-specific feasibility conditions: sufficient initial entropy, training stability, and prerequisite sub-skills. Motivated by this, we propose a self-bootstrapping paradigm in which a small-scale seed model generates filtered trajectories that activate Plan in any target model, eliminating the need for distillation from an external stronger model. Our pipeline activates Plan across every tested model and consistently outperforms competitive baselines on multi-hop QA benchmarks.

0 Citations
0 Influential
10 Altmetric
50.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!