Enterprises deploying multimodal large language models (MLLMs) often use cascades to reduce computational costs: a weaker, cheaper model handles most queries, with a stronger model used only when the weak model lacks confidence. However, a new attack exposes a critical vulnerability in this cost-saving architecture.
According to a paper on arXiv titled Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades, researchers Liu, Zhongye, Zeng, Yaopei, Chang, Yurui, Lin, and Lu demonstrated the Forced Deferral Attack (FDA), which lowers the weak model's confidence on purpose, forcing queries to be deferred to the strong model.
How the Forced Deferral Attack Works
The paper explains that MLLM cascades rely on the weak model's confidence score to decide whether to route a query to the strong model. An adversary can introduce a universal border trigger — an adversarial image perturbation — that consistently reduces the weak model's confidence. The FDA learns this trigger by optimizing a temperature-flattened objective, which pushes the weak model's token distribution on triggered inputs toward less concentrated targets derived from its clean responses.
“FDA learns a universal border trigger by optimizing a temperature-flattened objective,” the researchers reported. The attack is designed to work across datasets, model families, and deferral metrics.
Attack Performance Compared to Baselines
The researchers evaluated FDA against image-perturbation and prompt-injection baselines. According to the paper, FDA consistently increases strong-model routing and outperforms the baselines. This shows that MLLM cascades are vulnerable to attacks that manipulate compute allocation, forcing unintended strong-model usage without directly targeting answer correctness.
| Attack Method | Effectiveness (Strong-Model Routing Increase) |
|---|---|
| Forced Deferral Attack (FDA) | Higher (outperforms baselines) |
| Image-Perturbation Baseline | Lower |
| Prompt-Injection Baseline | Lower |
Implications for Enterprise AI Deployments
For technology leaders, this attack represents a new security consideration when deploying cost-optimized AI pipelines. Cascades are used not only in LLM inference but also in multimodal systems where vision and language combine. If left unaddressed, such attacks could lead to unanticipated cost increases as compute is siphoned to more expensive models. The paper notes that the attack does not target answer correctness, making it potentially stealthy.
The findings highlight the need for robust deferral mechanisms that are resistant to adversarial manipulation of confidence scores. Enterprises should evaluate the security posture of their AI routing decisions, particularly when cascades are integrated into customer-facing or revenue-critical applications.