2606.16074v1 Jun 15, 2026 cs.CL

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

Linhai Ma

Citations: 8

h-index: 2

Srivani Talakokkul

Citations: 7

h-index: 2

Ganesh Puthiaraju

Citations: 5

h-index: 1

Afshan Khan

Citations: 5

h-index: 1

Sarah Lowe

Citations: 3

h-index: 1

A. Roundtree

Citations: 403

h-index: 10

S. Fodeh

Citations: 32

h-index: 3

Ashley K. Hagaman

Citations: 2,129

h-index: 26

E. Irankhah

Citations: 92

h-index: 5

Sreeraj Ramachandran

Yale University

Citations: 75

h-index: 3

Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2

0 Citations

0 Influential

33 Altmetric

165.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!