2605.28032v1 May 27, 2026 cs.AI

PetroBench: A Benchmark for Large Language Models in Petroleum Engineering

Ting Zhang
Ting Zhang
Citations: 94
h-index: 3
Yingquan Wu
Yingquan Wu
Citations: 4
h-index: 1
Hengyu Meng
Hengyu Meng
Citations: 45
h-index: 3
Peng Zhou
Peng Zhou
Citations: 59
h-index: 3
Peng Li
Peng Li
Citations: 620
h-index: 14
Xiang Wang
Xiang Wang
Citations: 77
h-index: 2
Sen Wang
Sen Wang
Citations: 5
h-index: 2

Large Language Models are increasingly applied in the petroleum industry, highlighting the need for a domain-specific evaluation framework. This study develops a benchmark for LLMs in petroleum engineering, including a three-stage process of data preprocessing, quality filtering, and multi-model validation. Using expert review, a standardized question bank with strong domain relevance and discriminative capability was constructed. The benchmark covers production, reservoir, and drilling engineering, with 1,200 questions across multiple-choice, true or false, term definition, and short-answer formats. Eight mainstream LLMs were evaluated under a unified API environment. Results show that models performed better on subjective than objective questions, indicating weaknesses in factual knowledge discrimination. The highest accuracies for multiple-choice and true or false questions were 65.3% and 74.3%, respectively. Gemini-3-Pro, Kimi-K2.5, and Claude-Opus-4.6-Thinking achieved the best overall scores of 72%-74%. Models performed best in production engineering and weakest in reservoir engineering. Chinese models showed advantages in multiple-choice questions, while international models performed slightly better in short-answer questions. The benchmark provides a reproducible and practical reference for evaluating and deploying LLMs in petroleum engineering.

0 Citations
0 Influential
7 Altmetric
35.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!