Large language models (LLMs) are increasingly fine-tuned on uncurated text datasets, creating an opening for adversaries to inject malicious behavior. A new attack method, named Cordyceps, demonstrates a more subtle and persistent threat than previously known. According to a paper on arXiv, Cordyceps teaches an LLM an information hiding scheme via data poisoning, enabling covert control over the model's outputs without relying on fixed trigger phrases.
The Attack Mechanism
Traditional poisoning attacks depend on specific trigger words that defenses can detect and neutralize. Cordyceps, by contrast, builds semantic associations between shared knowledge—such as common facts or concepts—and attacker-chosen phrases. The paper explains that this induces a hiding scheme capable of encoding and decoding arbitrary malicious instructions. The attack is named after the parasitic fungus that takes over its host, reflecting the attack's ability to subvert the model from within.
Performance Metrics
The researchers evaluated Cordyceps across 5 LLMs, using 3 backdoor defenses and 4 prompt injection defenses. With only a small poisoned fraction of the training data, covert control attacks outperformed heuristic-based prompt injection attacks. The average attack success rate improved by approximately 40% relative to clean fine-tuned models. The paper notes that this advantage holds even when only a tiny proportion of the training data is poisoned.
Resilience Against Defenses
A key finding is Cordyceps' ability to circumvent existing mitigation strategies. The attack maintained a success rate of up to 93% after backdoor defenses—which typically involve outlier detection or clean-data regularization—and up to 98% after prompt injection defenses, which monitor outputs for malicious instructions. The following table summarizes the attack's persistence:
| Defense Type | Maximum Attack Success Rate |
|---|---|
| Backdoor defenses (detection & fine-tuning) | 93% |
| Prompt injection defenses (online monitoring) | 98% |
Implications for Enterprise AI
For organizations deploying LLMs in critical workflows—including supply chain management, logistics optimization, and trade documentation—this research highlights a new vulnerability. The ability of Cordyceps to encode arbitrary instructions through semantic associations means that even models fine-tuned on apparently benign data could harbor hidden backdoors. Enterprises relying on third-party fine-tuning or uncurated datasets must reassess their AI supply chain security. The paper, authored by Shao, Zedian, Fleming, Charles, Baluta, and Teodora, represents a wake-up call for adopting more robust validation and monitoring practices in AI systems.