iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities Freight Distress Report: More Carriers Shut Down, Logistics Firms Cut Jobs Across US New MBABench Evaluates LLM Agents on End-to-End Finance Spreadsheet Tasks Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data Multi-Agent Peer-Reviewed Reasoning Boosts LLM Accuracy in Medical Question Answering Europe needs 65 CO2 carriers and 33 ports by 2050 to meet carbon storage goals, Xodus report says LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance The Missing Knowledge Layer in Cognitive Architectures for AI Agents RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap Reinforcement Learning with Chain-of-Thought Supervision Boosts Hateful Meme Detection Accuracy by Over 2% Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities Freight Distress Report: More Carriers Shut Down, Logistics Firms Cut Jobs Across US New MBABench Evaluates LLM Agents on End-to-End Finance Spreadsheet Tasks Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data Multi-Agent Peer-Reviewed Reasoning Boosts LLM Accuracy in Medical Question Answering Europe needs 65 CO2 carriers and 33 ports by 2050 to meet carbon storage goals, Xodus report says LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance The Missing Knowledge Layer in Cognitive Architectures for AI Agents RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap Reinforcement Learning with Chain-of-Thought Supervision Boosts Hateful Meme Detection Accuracy by Over 2%
Home ›› Technology ›› Ai ›› Llms ›› DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy

DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy

DeepRoot is a multi-agent LLM system that jointly builds and utilizes a verified knowledge graph for therapeutic reasoning over historical medical texts. Applied to the Shen Nong Ben Cao Jing, it recovers 10 of 21 held-out compound-disease treatment pairs at R@20 (47.6%), significantly outperforming a raw corpus LLM (4.8%) and random baseline (2.4%). The system also reduces hallucination to 7-10% compared to 87% for tool-using LLMs, offering a scalable method for mining historical medical knowledge.

iG
iGEN Editorial
June 16, 2026
DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy

Historical medical archives and traditional medicines hold immense potential for drug discovery, according to a paper on arXiv. However, pre-ontological prose and idiosyncratic taxonomies prevent standardization and medical modernization of the data for use in current biomedical pipelines. The paper reports that no existing LLM agent system—whether tool-calling, retrieval-augmented, or agentic deep-research—can convert such text into verifiable drug-discovery leads at scale. DeepRoot, a multi-agent LLM system introduced in the paper, closes this gap by jointly building and utilizing a verified knowledge graph.

The Problem: Unstructured Historical Medical Data

The paper identifies that historical medical texts contain valuable knowledge but are not machine-readable due to non-standard terminologies and narrative structures. This prevents direct application of modern biomedical pipelines for drug discovery. Existing LLM approaches, including those with tool-calling capabilities, struggle with hallucination and lack systematic reasoning. The authors note that grounding and reasoning—often conflated—are separable axes that a system can compose for therapeutic reasoning.

DeepRoot's Architecture: Knowledge Graph and Multi-Agent Coordination

DeepRoot is a multi-agent system built on large language models (LLMs) that coordinates multiple agents to both construct and query a verified knowledge graph (KG). The paper describes that the system separates the tasks of building the knowledge graph from reasoning over it. This allows the KG to serve as a factual grounding layer, while LLMs provide flexible reasoning. The multi-agent setup enables the system to combine structured knowledge from the graph with natural language inference, aiming to produce verifiable drug-discovery leads.

Performance Results: Accuracy and Hallucination Rates

Applied to the Shen Nong Ben Cao Jing, a classic Chinese medical text, DeepRoot achieved significant results. The paper reports that DeepRoot recovers 10 of 21 held-out compound-disease treatment pairs at R@20, yielding 47.6% accuracy. This compares to 4.8% for a raw corpus LLM and approximately 2.4% for random chance. In an LLM-as-judge audit for reasoning quality, DeepRoot dominated baseline LLMs and LLMs with direct tool-call access to the same APIs that DeepRoot itself queries.

A critical finding concerns hallucination rates. Tool-using LLMs hallucinated evidence on 87% of claims, according to the paper. DeepRoot, by contrast, hallucinated on only 7-10% of claims. Graph-only inference hallucinated 0% but ranked lowest on reasoning coherence. DeepRoot's combined KG+LLM approach was the only condition to win on both axes: low hallucination and high reasoning quality.

System Condition Recovery Rate (R@20) Hallucination Rate Reasoning Coherence (Rank)
Raw corpus LLM 4.8% (not reported separately) Lower
Random baseline ~2.4% - -
Tool-using LLMs (not reported) 87% Lower than DeepRoot
Graph-only inference (not reported) 0% Lowest
DeepRoot (KG+LLM) 47.6% 7-10% Highest

Implications for Drug Discovery and Medical AI

The paper argues that DeepRoot points toward a systematic route for mining and repurposing historical medical knowledge. By treating grounding and reasoning as separable axes, the system demonstrates that combining a verified knowledge graph with LLM-based reasoning can simultaneously reduce hallucination and improve reasoning quality. This approach could enable scalable conversion of pre-ontological medical texts into structured, actionable knowledge for drug development pipelines. The results on the Shen Nong Ben Cao Jing suggest that similar methods could be applied to other historical medical archives, potentially uncovering treatment leads that have been overlooked in modern research.


Sources:

Keep Reading

Recommended Stories

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models Technology

AdaMame: New Training Recipe Solves Language Collapse in Multilingual Reasoning Models

AdaMame, a two-stage training recipe for multilingual mathematical reasoning, addresses language collapse in large reasoning models. It adaptively aligns reasoning language to the query language without compromising accuracy, achieving Pareto-optimal performance across 12 languages.

June 16, 2026
XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems Technology

XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems

Researchers introduce XMedFusion, a knowledge-guided multimodal perception and reasoning framework for autonomous medical systems. The framework decomposes visual information into coordinated agents, achieving significant improvements in radiology report generation metrics on a public chest radiograph dataset.

June 16, 2026
New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control Technology

New Research Reveals Truthfulness Preserved Across LLM Lineages, Enabling Better Hallucination Control

A new paper from researchers shows that truthfulness-related attention heads are preserved across generations of large language models, even after instruction tuning or multimodal adaptation. The authors propose TruthProbe, a soft-gating strategy that amplifies these heads to reduce hallucinations, with improvements on HaluEval, POPE, and CHAIR benchmarks.

June 16, 2026
DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining Technology

DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining

Researchers propose DYNA, a lightweight framework that connects frozen large language models (LLMs) to a temporal knowledge graph, enabling continuous learning without costly retraining. On three temporal recall tasks, DYNA reduces catastrophic forgetting by ~7% compared to fine-tuning and improves temporal ordering by ~5% over standard retrieval-augmented generation (RAG). The paper also finds that higher graph clustering coefficients correlate with better retrieval, indicating the importance of graph structure.

June 16, 2026