iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Apple explains why Siri AI took so long: first version ready last year but rebuilt from ground up New LLM Framework Detects Phishing Emails with Over 90% Accuracy Dual-Granularity Orthogonal Disentanglement: New Framework Boosts Generalizable Audio Deepfake Detection Medical Image Segmentation Survey: U-Net, Transformers, SAM and Clinical Translation Challenges Bayesian Inference and Decision Audits Reveal Unreliability in Frontier AI Evaluation Archives Dali casualty exposes erosion of technical ownership in shipmanagement, warns veteran Kapoor Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Apple explains why Siri AI took so long: first version ready last year but rebuilt from ground up New LLM Framework Detects Phishing Emails with Over 90% Accuracy Dual-Granularity Orthogonal Disentanglement: New Framework Boosts Generalizable Audio Deepfake Detection Medical Image Segmentation Survey: U-Net, Transformers, SAM and Clinical Translation Challenges Bayesian Inference and Decision Audits Reveal Unreliability in Frontier AI Evaluation Archives Dali casualty exposes erosion of technical ownership in shipmanagement, warns veteran Kapoor
Home ›› Technology ›› Ai ›› Llms ›› Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

A new study on arXiv investigates few-shot biomedical relation extraction using large language models (LLMs). The best model achieved micro-F1 of 0.44, surpassing prior few-shot results but below supervised baseline. However, on macro-F1, prompt-based methods outperformed supervised learning, particularly on rare relation types, highlighting LLMs' potential in low-resource settings.

iG
iGEN Editorial
June 16, 2026
Few-Shot Biomedical Relation Extraction with LLMs: A Viable Alternative to Supervised Learning?

Biomedical relation extraction (BioRE) is a critical process for converting unstructured biomedical literature into structured knowledge, but traditional supervised approaches depend on costly annotated datasets that limit scalability across relation types and domains. A preprint on arXiv (ID: 2606.15412) authored by Mraz, Jakob, Curk, Tomaž, and Zupan, Blaž investigates whether large language models (LLMs) can serve as a viable alternative through few-shot prompt-based learning.

Task Formulations and Experimental Design

The study compares two task formulations for few-shot BioRE: pairwise classification, which predicts relations for individual entity pairs, and joint generation, which extracts multiple relations in a single model call. Experiments were conducted on the BioREDirect dataset. The authors report a clear precision-recall trade-off between the two approaches.

Formulation Precision Recall Efficiency
Pairwise classification Lower Higher Lower
Joint generation Higher Lower Higher

The joint generation method is more computationally efficient but sacrifices recall, while pairwise classification captures more relations at the cost of precision.

Key Performance Metrics

The best-performing model achieved a micro-F1 score of 0.44, substantially outperforming previous few-shot results (0.34) but remaining below the supervised baseline (0.56). Notably, much of this gap is attributable to a single ambiguously defined relation type. When evaluated using macro-F1, which better captures performance across imbalanced relation types, prompt-based approaches outperformed the supervised baseline (0.45 vs. 0.38), particularly on rare relation types.

Implications for Low-Resource Applications

These findings underscore the potential of LLMs for BioRE in low-resource settings where annotated data is scarce. The superior macro-F1 performance on rare types suggests that LLMs can generalize better to less frequent relations, a common challenge in biomedical domains. However, the study emphasizes the importance of well-defined relation schemas to avoid ambiguity that degrades performance.

Limitations and Future Directions

While prompt-based learning shows promise, the micro-F1 gap indicates that supervised learning remains superior when sufficient annotated data is available. The authors note that the ambiguity of a single relation type accounts for most of the performance difference. Future work may focus on refining relation definitions or combining few-shot LLM approaches with small amounts of supervised data to bridge the remaining gap.


Sources:

Keep Reading

Recommended Stories

SCAN Framework Helps CTOs Decide When to Use Generative AI for Task Allocation Technology

SCAN Framework Helps CTOs Decide When to Use Generative AI for Task Allocation

A new academic paper introduces SCAN, a decision-making framework for task allocation with generative AI. Based on Vygotsky's Zone of Proximal Development and Metacognition, SCAN defines four sub-zones—Substitute, Complement, Aid, Non-negotiable—to guide knowledge workers and students in effectively using GenAI. The framework also addresses cognitive load, cognitive offloading, sycophancy, and the future of work.

June 16, 2026
AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models Technology

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame, a two-stage training recipe for multilingual mathematical reasoning, addresses language collapse in large reasoning models. It adaptively aligns reasoning language to the query language without compromising accuracy, achieving Pareto-optimal performance across 12 languages.

June 16, 2026
Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8% Technology

Think-at-Hard: Selective Latent Iterations Boost LLM Reasoning Accuracy by Up to 6.8%

A new research paper proposes Think-at-Hard (TaH), a looped transformer that selectively performs latent iterations only on tokens likely to be incorrect. By skipping iterations on 93% of tokens, TaH outperforms always-iterate models by 3.8-4.4% and single-iteration baselines by up to 6.8%, while requiring negligible extra parameters.

June 16, 2026
SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Technology

SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation

SPARK (Security Knowledge Priming and Representation-Guided Knowledge Activation) is a new inference-time method that improves the security of code generated by large language models without requiring retraining. The researchers argue that pretraining data already contains sufficient security material; the bottleneck is activation. Evaluated on 9 open-source and 7 proprietary models, SPARK matches or improves secure code generation baselines while preserving code utility.

June 16, 2026