2605.29966v1 May 28, 2026 cs.AI

Compass: Navigating Global Marine Lead Data Integration through Expert-Guided LLM Agent

Bin Lu
Bin Lu
Shanghai Jiao Tong University
Citations: 432
h-index: 8
Lei Zhou
Lei Zhou
Citations: 1
h-index: 1
Xinbing Wang
Xinbing Wang
Citations: 371
h-index: 11
Cheng Zhou
Cheng Zhou
Citations: 1,450
h-index: 20
Shuo Jiang
Shuo Jiang
Citations: 5
h-index: 2
Yiming Liu
Yiming Liu
Citations: 15
h-index: 2
Ziyuan Sang
Ziyuan Sang
Citations: 0
h-index: 0
Jing Zhang
Jing Zhang
Citations: 20
h-index: 3
Mengdi Jin
Mengdi Jin
Citations: 9
h-index: 2

Marine lead (Pb) and its isotopes are critical tracers for ocean circulation and anthropogenic pollution, yet in-situ observations remain costly and sparse. While vast historical records exist, they lie buried within the unstructured content of academic papers, creating "data silos" inaccessible to comprehensive analysis. Manual extraction is unscalable, while general-purpose Large Language Models (LLMs) lack the necessary domain-specific knowledge, leading to hallucinations and scientifically invalid outputs. To address this, we introduce an expert-guided adaptation approach that enables LLMs to perform rigorous scientific data extraction without fine-tuning. We operationalize this approach through Compass, an LLM agent framework enhanced by a Knowledge Tree co-designed with marine scientists, which decomposes complex tasks into verifiable steps, guiding the agent's reasoning to ensure scientific validity. Deploying Compass across a corpus of over 230,000 relevant open-access papers, we successfully extract 3,751 previously unincorporated Pb records. This effort establishes the largest integrated marine Pb database to date. Beyond standard metrics, Compass demonstrates superior reliability through multi-layered validation, achieving 92% accuracy as confirmed through expert manual verification. The newly integrated data expand coverage in previously under-sampled regions such as the East China Sea and the Southern Ocean, providing an enriched data foundation for future scientific discoveries. We release an interactive visualization platform to facilitate open scientific access. Our work demonstrates that expert-guided agents can effectively bridge the gap between general-purpose LLMs and high-stakes scientific domains, enabling scalable data discovery in geosciences.

0 Citations
0 Influential
10 Altmetric
50.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!