PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Researchers introduce PVminerLLM2, an improved set of LLMs for structured extraction of patient voice from unstructured text. The model uses preference optimization with token-level gated stabilization and confusion-aware pair construction to outperform supervised fine-tuning baselines. The code and trained models are publicly available.

iGEN Editorial

June 16, 2026

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Unstructured patient-generated text captures critical information about lived experiences, social context, and care engagement, but its clinical value remains locked without structured extraction. A new family of language models, PVminerLLM2, aims to unlock that data by applying preference optimization — a technique that refines model outputs beyond what supervised fine-tuning (SFT) can achieve.

Limitations of Supervised Fine-Tuning

Prior work established the PV-Miner benchmark and the PVMinerLLM models for structured extraction of patient voice. However, according to the researchers, supervised fine-tuning alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. These errors — such as mislabeling a single token in a clinical code — can render an extraction useless.

Key Technical Innovations

The team behind PVminerLLM2 introduces four main innovations to overcome SFT limitations:

Token-level gated stabilization term — prevents degradation of absolute token likelihood under preference optimization, ensuring the model does not forget high-confidence tokens while learning from preferences.
Confusion-aware preference pair construction — better captures low-separation distinctions by deliberately constructing training pairs from tokens the model finds hardest to distinguish.
Token-importance weighting — assigns higher weight to tokens critical for correct extraction.
Inverse-frequency reweighing — addresses token imbalance and class skew, common in medical text where certain codes appear far more often than others.

Performance Gains Across Metrics

Evaluated across multiple model sizes, PVminerLLM2 consistently outperformed strong baselines, including baseline LLMs trained with existing preference optimization methods. The improvements are summarized below:

Metric	Gain over Baseline
Code	4.43%
Sub-code	3.50%
Span	1.55%

These gains, while modest in absolute terms, represent substantial reductions in token-critical errors for structured extraction tasks.

Availability and Implementation

The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at the project's URL on arXiv. This open release allows other researchers and enterprises to apply the same preference optimization techniques to their own structured extraction problems — whether in healthcare documentation, clinical trial data mining, or other domains where token-level accuracy is paramount.

The research was authored by Fodeh, Samah, Ma, Linhai, Puthiaraju, Ganesh, Talakokkul, Srivani, Khan, Afshan, Irankhah, Elyas, Ramachandran, Sreeraj, Hagaman, Ashley, Lowe, Sarah, and Roundtree, Aimee, and appeared on arXiv as a Computer Science paper under the title "PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization".

Sources:

PVminerLLM2 Uses Preference Optimization to Improve Structured Patient Voice Extraction

Limitations of Supervised Fine-Tuning

Key Technical Innovations

Performance Gains Across Metrics

Availability and Implementation

Recommended Stories

LLM Paraphrase Augmentation Boosts Sign Language Translation Performance

MoCA-Agent: Market-of-Claims Code Agent Achieves Strong Results in Financial and Numerical Reasoning

Vocabulary Dropout Technique Prevents Diversity Collapse in LLM Co-Evolution Training

EHRNote-ChatQA: New Benchmark Tests LLMs on Multi-Turn Clinical Question Answering