2605.28683v1 May 27, 2026 cs.AI

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Jian Liang
Jian Liang
Citations: 9,396
h-index: 5
Mu Xu
Mu Xu
Citations: 82
h-index: 6
Hang Zhang
Hang Zhang
Citations: 198
h-index: 5
Yuting Xu
Yuting Xu
Citations: 0
h-index: 0
Jiayi Tian
Jiayi Tian
Citations: 195
h-index: 5
Xin Xiong
Xin Xiong
Citations: 1
h-index: 1
Xiaoyun Zhang
Xiaoyun Zhang
Citations: 2,827
h-index: 2

Existing benchmarks have laid the foundation for travel planning agents by establishing API-centric paradigms. However, as the capabilities of Autonomous Agents continue to advance, their evaluation must evolve beyond simple tool execution toward handling the inherent complexities of the open web. Current benchmarks bypass core cognitive hurdles: they fail to account for information noise, ignore multi-source factual contradictions, and overlook the necessity of grounding visual perception into logical planning. We introduce VeriTrip, a verifiable benchmark designed to meet the increasing demands for agent robustness and reliability. VeriTrip shifts the evaluation focus to evidence-grounded reasoning over unstructured multimodal web corpora. It establishes a Multimodal Retrieval Base (MRB) derived from real-world sources, forcing agents to autonomously orchestrate queries across heterogeneous data. A synchronized Verifiable Knowledge Base (VKB) enables a cell-wise verification protocol that precisely quantifies factual reliability, distinguishing systematic reasoning failures from parametric hallucinations. Our evaluations across leading MLLMs reveal a critical \textit{retrieval-reasoning trade-off}: the cognitive load of autonomous retrieval significantly erodes instruction retention. VeriTrip provides the rigorous foundation necessary for the next generation of planning agents capable of operating in unconstrained, multimodal environments.

0 Citations
0 Influential
3 Altmetric
15.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!