New Attack Forces Costly Model Usage in Multimodal LLM Cascades

A research paper introduces the Forced Deferral Attack (FDA), which manipulates confidence thresholds in multimodal large language model cascades, causing queries to be routed to more expensive strong models. The attack raises security concerns for enterprises deploying cost-optimized AI systems.

iGEN Editorial

June 16, 2026

New Attack Forces Costly Model Usage in Multimodal LLM Cascades

Enterprises deploying multimodal large language models (MLLMs) often use cascades to reduce computational costs: a weaker, cheaper model handles most queries, with a stronger model used only when the weak model lacks confidence. However, a new attack exposes a critical vulnerability in this cost-saving architecture.

According to a paper on arXiv titled Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades, researchers Liu, Zhongye, Zeng, Yaopei, Chang, Yurui, Lin, and Lu demonstrated the Forced Deferral Attack (FDA), which lowers the weak model's confidence on purpose, forcing queries to be deferred to the strong model.

How the Forced Deferral Attack Works

The paper explains that MLLM cascades rely on the weak model's confidence score to decide whether to route a query to the strong model. An adversary can introduce a universal border trigger — an adversarial image perturbation — that consistently reduces the weak model's confidence. The FDA learns this trigger by optimizing a temperature-flattened objective, which pushes the weak model's token distribution on triggered inputs toward less concentrated targets derived from its clean responses.

“FDA learns a universal border trigger by optimizing a temperature-flattened objective,” the researchers reported. The attack is designed to work across datasets, model families, and deferral metrics.

Attack Performance Compared to Baselines

The researchers evaluated FDA against image-perturbation and prompt-injection baselines. According to the paper, FDA consistently increases strong-model routing and outperforms the baselines. This shows that MLLM cascades are vulnerable to attacks that manipulate compute allocation, forcing unintended strong-model usage without directly targeting answer correctness.

Attack Method	Effectiveness (Strong-Model Routing Increase)
Forced Deferral Attack (FDA)	Higher (outperforms baselines)
Image-Perturbation Baseline	Lower
Prompt-Injection Baseline	Lower

Implications for Enterprise AI Deployments

For technology leaders, this attack represents a new security consideration when deploying cost-optimized AI pipelines. Cascades are used not only in LLM inference but also in multimodal systems where vision and language combine. If left unaddressed, such attacks could lead to unanticipated cost increases as compute is siphoned to more expensive models. The paper notes that the attack does not target answer correctness, making it potentially stealthy.

The findings highlight the need for robust deferral mechanisms that are resistant to adversarial manipulation of confidence scores. Enterprises should evaluate the security posture of their AI routing decisions, particularly when cascades are integrated into customer-facing or revenue-critical applications.

Sources:

New Attack Forces Costly Model Usage in Multimodal LLM Cascades

How the Forced Deferral Attack Works

Attack Performance Compared to Baselines

Implications for Enterprise AI Deployments

Recommended Stories

SAMark Watermarking Breaks Paraphrase Robustness Barrier for AI-Generated Text

UniT Framework Enables Multimodal Chain-of-Thought Test-Time Scaling for AI Reasoning

AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents

SkillVetBench Uses LLM-as-Judge to Evaluate Security Risks in Open-Source Agent Skills