Cordyceps: New Data Poisoning Attack Covertly Controls Large Language Models

A new paper on arXiv presents Cordyceps, a data poisoning attack that embeds covert control instructions into large language models through semantic associations. Tested across five LLMs, it achieves up to 93% attack success after backdoor defenses and 98% after prompt injection defenses, outperforming heuristic methods by 40%.

iGEN Editorial

June 16, 2026

Cordyceps: New Data Poisoning Attack Covertly Controls Large Language Models

Large language models (LLMs) are increasingly fine-tuned on uncurated text datasets, creating an opening for adversaries to inject malicious behavior. A new attack method, named Cordyceps, demonstrates a more subtle and persistent threat than previously known. According to a paper on arXiv, Cordyceps teaches an LLM an information hiding scheme via data poisoning, enabling covert control over the model's outputs without relying on fixed trigger phrases.

The Attack Mechanism

Traditional poisoning attacks depend on specific trigger words that defenses can detect and neutralize. Cordyceps, by contrast, builds semantic associations between shared knowledge—such as common facts or concepts—and attacker-chosen phrases. The paper explains that this induces a hiding scheme capable of encoding and decoding arbitrary malicious instructions. The attack is named after the parasitic fungus that takes over its host, reflecting the attack's ability to subvert the model from within.

Performance Metrics

The researchers evaluated Cordyceps across 5 LLMs, using 3 backdoor defenses and 4 prompt injection defenses. With only a small poisoned fraction of the training data, covert control attacks outperformed heuristic-based prompt injection attacks. The average attack success rate improved by approximately 40% relative to clean fine-tuned models. The paper notes that this advantage holds even when only a tiny proportion of the training data is poisoned.

Resilience Against Defenses

A key finding is Cordyceps' ability to circumvent existing mitigation strategies. The attack maintained a success rate of up to 93% after backdoor defenses—which typically involve outlier detection or clean-data regularization—and up to 98% after prompt injection defenses, which monitor outputs for malicious instructions. The following table summarizes the attack's persistence:

Defense Type	Maximum Attack Success Rate
Backdoor defenses (detection & fine-tuning)	93%
Prompt injection defenses (online monitoring)	98%

Implications for Enterprise AI

For organizations deploying LLMs in critical workflows—including supply chain management, logistics optimization, and trade documentation—this research highlights a new vulnerability. The ability of Cordyceps to encode arbitrary instructions through semantic associations means that even models fine-tuned on apparently benign data could harbor hidden backdoors. Enterprises relying on third-party fine-tuning or uncurated datasets must reassess their AI supply chain security. The paper, authored by Shao, Zedian, Fleming, Charles, Baluta, and Teodora, represents a wake-up call for adopting more robust validation and monitoring practices in AI systems.

Sources:

Cordyceps: New Data Poisoning Attack Covertly Controls Large Language Models

The Attack Mechanism

Performance Metrics

Resilience Against Defenses

Implications for Enterprise AI

Recommended Stories

SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains

How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability

New Unified Definition of AI Hallucination Pins It on Inaccurate World Modeling