2605.26463v1 May 26, 2026 cs.CL

Towards Error-Free EHRs: Reasoning-Intensive Consistency Verification Between Clinical Notes and Structured Tables in Electronic Health Records

P. Rabaey
P. Rabaey
Citations: 61
h-index: 4
Jiho Kim
Jiho Kim
Citations: 310
h-index: 9
Yeonsu Kwon
Yeonsu Kwon
Citations: 320
h-index: 5
Jun-Min Lee
Jun-Min Lee
Citations: 4
h-index: 1
Edward Choi
Edward Choi
Citations: 8
h-index: 2
Junseong Choi
Junseong Choi
Citations: 11
h-index: 2
Sujeong Im
Sujeong Im
Citations: 108
h-index: 3
Sangji Lee
Sangji Lee
Citations: 4
h-index: 1
Hyunwook Kwon
Hyunwook Kwon
Citations: 11
h-index: 1
J. Kim
J. Kim
Citations: 448
h-index: 11
Minseo Kim
Minseo Kim
Citations: 11
h-index: 1
Jeewon Yang
Jeewon Yang
Citations: 92
h-index: 3
H. Yoon
H. Yoon
Citations: 0
h-index: 0

Data consistency between unstructured clinical notes and structured tables in Electronic Health Records (EHRs) is essential for patient safety and clinical decision-making. However, existing work on note-table consistency verification mainly relies on surface-level matching of numeric values or simple events. Such approaches fail to capture the reasoning underlying real-world EHR documentation, including clinical interpretation, event relations, and temporal changes. To address this gap, we introduce EHR-ReasonCon, a reasoning-intensive benchmark for note-table consistency verification. Built on MIMIC-III with expert-guided annotations, it comprises 8,048 entities derived from clinical notes and provides high-quality ground-truth labels. The annotation protocol is supported by specialized table-exploration tools to ensure systematic evidence retrieval and reliable consistency assessment. We also propose EHR-Inspector, an LLM-based framework that segments notes, extracts anchor entities and temporal references, and uses table-exploration tools to verify consistency against structured tables. Evaluated using expert-validated LLM-as-a-judge metrics under harsh and lenient criteria, EHR-Inspector achieves state-of-the-art performance across multiple model backbones. Analyses further demonstrate the effectiveness of its components and highlight differences from human verification.

0 Citations
0 Influential
5.5 Altmetric
27.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!