2606.16074v1 Jun 15, 2026 cs.CL

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

Linhai Ma
Linhai Ma
Citations: 8
h-index: 2
Srivani Talakokkul
Srivani Talakokkul
Citations: 7
h-index: 2
Ganesh Puthiaraju
Ganesh Puthiaraju
Citations: 5
h-index: 1
Afshan Khan
Afshan Khan
Citations: 5
h-index: 1
Sarah Lowe
Sarah Lowe
Citations: 3
h-index: 1
A. Roundtree
A. Roundtree
Citations: 403
h-index: 10
S. Fodeh
S. Fodeh
Citations: 32
h-index: 3
Ashley K. Hagaman
Ashley K. Hagaman
Citations: 2,129
h-index: 26
E. Irankhah
E. Irankhah
Citations: 92
h-index: 5
Sreeraj Ramachandran
Sreeraj Ramachandran
Yale University
Citations: 75
h-index: 3

Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2

0 Citations
0 Influential
33 Altmetric
165.0 Score
Original PDF
0

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!