Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

A new study on arXiv investigates few-shot biomedical relation extraction using large language models (LLMs). The best model achieved micro-F1 of 0.44, surpassing prior few-shot results but below supervised baseline. However, on macro-F1, prompt-based methods outperformed supervised learning, particularly on rare relation types, highlighting LLMs' potential in low-resource settings.

iGEN Editorial

June 16, 2026

Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

Biomedical relation extraction (BioRE) is a critical process for converting unstructured biomedical literature into structured knowledge, but traditional supervised approaches depend on costly annotated datasets that limit scalability across relation types and domains. A preprint on arXiv (ID: 2606.15412) authored by Mraz, Jakob, Curk, Tomaž, and Zupan, Blaž investigates whether large language models (LLMs) can serve as a viable alternative through few-shot prompt-based learning.

Task Formulations and Experimental Design

The study compares two task formulations for few-shot BioRE: pairwise classification, which predicts relations for individual entity pairs, and joint generation, which extracts multiple relations in a single model call. Experiments were conducted on the BioREDirect dataset. The authors report a clear precision-recall trade-off between the two approaches.

Formulation	Precision	Recall	Efficiency
Pairwise classification	Lower	Higher	Lower
Joint generation	Higher	Lower	Higher

The joint generation method is more computationally efficient but sacrifices recall, while pairwise classification captures more relations at the cost of precision.

Key Performance Metrics

The best-performing model achieved a micro-F1 score of 0.44, substantially outperforming previous few-shot results (0.34) but remaining below the supervised baseline (0.56). Notably, much of this gap is attributable to a single ambiguously defined relation type. When evaluated using macro-F1, which better captures performance across imbalanced relation types, prompt-based approaches outperformed the supervised baseline (0.45 vs. 0.38), particularly on rare relation types.

Implications for Low-Resource Applications

These findings underscore the potential of LLMs for BioRE in low-resource settings where annotated data is scarce. The superior macro-F1 performance on rare types suggests that LLMs can generalize better to less frequent relations, a common challenge in biomedical domains. However, the study emphasizes the importance of well-defined relation schemas to avoid ambiguity that degrades performance.

Limitations and Future Directions

While prompt-based learning shows promise, the micro-F1 gap indicates that supervised learning remains superior when sufficient annotated data is available. The authors note that the ambiguity of a single relation type accounts for most of the performance difference. Future work may focus on refining relation definitions or combining few-shot LLM approaches with small amounts of supervised data to bridge the remaining gap.

Sources:

Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

Task Formulations and Experimental Design

Key Performance Metrics

Implications for Low-Resource Applications

Limitations and Future Directions

Recommended Stories

New Method LUCID Detects Hallucinations in LLM-Based Knowledge Graph Reasoning

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

LLMs Learn to Hack Social Rules, Researchers Warn of 'Societal Hacking' Risk

Large Language Models Can Read Compressed Text That Humans Cannot, Researchers Find