2606.10460v1 Jun 09, 2026 cs.CL

LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

Yusen Zhang
Yusen Zhang
Citations: 2
h-index: 1
Eugene Wu
Eugene Wu
Citations: 8
h-index: 2
Eden Wu
Eden Wu
Citations: 60
h-index: 3
Juliana Freire
Juliana Freire
Citations: 78
h-index: 4
Haonan Wang
Haonan Wang
National University of Singapore
Citations: 226
h-index: 7
Jiaxiang Liu
Jiaxiang Liu
Citations: 23
h-index: 3
Yurong Liu
Yurong Liu
Citations: 120
h-index: 5
Austin Wijaya
Austin Wijaya
Citations: 0
h-index: 0
Tianle Zhou
Tianle Zhou
Citations: 7
h-index: 2
Yijia Chen
Yijia Chen
Citations: 2
h-index: 1
Wanting You
Wanting You
Citations: 0
h-index: 0
Reya Vir
Reya Vir
Citations: 10
h-index: 2
Daniela Pinto
Daniela Pinto
Citations: 0
h-index: 0
Grace Fan
Grace Fan
NYU
Citations: 396
h-index: 8

Recent large language models (LLMs) have shown rapid progress in reading-based question answering (QA), where evidence is explicitly provided or can be trivially retrieved. In contrast, real-world questions are often not paired with accurate evidence documents. The useful evidence resides in massive data lakes, making search a prerequisite for answering. However, there is a lack of comprehensive benchmarks that require both searching and reasoning over large data lakes. To this end, we introduce LakeQA, a comprehensive benchmark for search-centric question answering over data lakes that jointly emphasizes searching and reasoning capabilities. LakeQA is built on a heterogeneous collection of approximately 9.5 TB of text resources from Wikipedia and open-source government data, spanning structured and unstructured data. To ensure task quality, each sample is annotated by at least one Ph.D.-level expert. Each task requires long-horizon multi-hop reasoning with implicit intermediate steps: agents need to discover the correct documents and then compose evidence across sources to produce the answer. Experimental results on seven frontier LLMs demonstrate that LakeQA is challenging. For instance, GPT-5.2 achieves only an exact-match score of 18.37% on LakeQA. Overall, LakeQA provides a realistic testbed for developing LLM agents that can both find and analyze data in modern data lakes.

1 Citations
0 Influential
4 Altmetric
21.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!