2601.18202v1 Jan 26, 2026 cs.AI

SAGE: 실행 피드백을 활용한 심층 검색용 제어 가능한 에이전트 기반 데이터 생성

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Rujun Han

Citations: 229

h-index: 6

Yanfei Chen

Citations: 543

h-index: 11

Zifeng Wang

Citations: 188

h-index: 3

I-Hung Hsu

Citations: 169

h-index: 5

Vishy Tirumalashetty

Citations: 74

h-index: 1

Eunsol Choi

Citations: 131

h-index: 6

Tomas Pfister

Citations: 651

h-index: 15

Fangyuan Xu

Citations: 15

h-index: 1

Jun Yan

Google

Citations: 4,221

h-index: 19

Chen-Yu Lee

Citations: 0

h-index: 0

여러 문서를 아우르는 추론이 필요한 복잡한 질문에 답하는 것을 목표로 하는 심층 검색 에이전트는 정보 탐색 과정을 획기적으로 가속화할 수 있습니다. 그러나 탐색 경로가 길고 복잡하여 이 분야에 필요한 인간 주석 데이터를 수집하는 데는 막대한 비용이 듭니다. 본 논문에서는 주어진 말뭉치와 목표 난이도에 대해 고품질의 난이도 조절이 가능한 심층 검색 질문-답변 쌍을 자동으로 생성하는 에이전트 파이프라인을 제안합니다. 우리의 파이프라인인 SAGE는 QA 쌍을 제안하는 데이터 생성기와, 생성된 질문을 해결하고 데이터 생성기에 실행 피드백을 제공하는 검색 에이전트로 구성됩니다. 이 두 구성 요소는 목표 난이도를 충족할 때까지 여러 라운드에 걸쳐 상호 작용하며 질문-답변 쌍을 반복적으로 정제합니다. 내재적 평가 결과, SAGE는 다양한 추론 전략을 필요로 하는 질문을 생성하는 동시에 생성된 데이터의 정확성과 난이도를 크게 높이는 것으로 나타났습니다. 외재적 평가에서는 우리의 합성 데이터로 심층 검색 에이전트를 학습시켰을 때 인기 있는 심층 검색 벤치마크에서 최대 23%의 상대적 성능 향상을 입증했습니다. 추가 실험을 통해 우리 데이터로 훈련된 에이전트가 추론 시 추가 훈련 없이도 고정된 말뭉치 검색에서 구글 검색으로 적응할 수 있음을 확인했습니다.

Original Abstract

Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for this application is prohibitively expensive due to long and complex exploration trajectories. We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs for a given corpus and a target difficulty level. Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question and provide execution feedback for the data generator. The two components interact over multiple rounds to iteratively refine the question-answer pairs until they satisfy the target difficulty level. Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning strategies, while significantly increases the correctness and difficulty of the generated data. Our extrinsic evaluation demonstrates up to 23% relative performance gain on popular deep search benchmarks by training deep search agents with our synthetic data. Additional experiments show that agents trained on our data can adapt from fixed-corpus retrieval to Google Search at inference time, without further training.

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!