2606.12871v1 Jun 11, 2026 cs.AI

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

Mingyang Zhu
Mingyang Zhu
Citations: 20
h-index: 1
Lin Qiu
Lin Qiu
Citations: 78
h-index: 7
Ziwen Wang
Ziwen Wang
Citations: 24
h-index: 3
Xunliang Cai
Xunliang Cai
Citations: 231
h-index: 8
Wei Liu
Wei Liu
Citations: 164
h-index: 2
Zheren Fu
Zheren Fu
University of Science and Technology of China
Citations: 5,122
h-index: 6
L. Zhang
L. Zhang
Citations: 446
h-index: 9
Zhendong Mao
Zhendong Mao
Citations: 11
h-index: 2
Jingxuan Han
Jingxuan Han
Citations: 36
h-index: 3
Youpeng Wang
Youpeng Wang
Citations: 0
h-index: 0
Xuezhi Cao
Xuezhi Cao
Citations: 6
h-index: 1

Search Agents (SAs) typically leverage large language models (LLMs) to support complex information-seeking tasks by autonomously exploring web sources and synthesizing information into comprehensive responses. For SAs evaluation, prior benchmarks mainly focus on specialized tasks that are unlikely to arise in real-world user scenarios. Moreover, their reliance on coarse task-level rubrics often limits evaluation interpretability. To bridge this gap, we introduce DailyReport, an open-ended benchmark to evaluate SA capabilities on daily search tasks. It contains 150 open-ended tasks with 3,546 associated rubrics, capturing widely discussed and timely information demands of real-world users. Each task is decomposed into subtasks and evaluated with cascade rubrics across disentangled dimensions. Through cascade performance attribution and user-centric aggregation, we derive highly interpretable scores for each dimension, along with a user preference score. Our results on 17 agentic systems show that current systems still fall short of users' expectations. To facilitate future research, our dataset and code are made publicly available at https://github.com/AGI-Eval-Official/DailyReport.

0 Citations
0 Influential
33.45879734614 Altmetric
167.3 Score
Original PDF
5

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!