Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

A research paper titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' introduces a controlled probe to measure position bias in multimodal KB-VQA. The study finds a strong primacy effect, where the first retrieved passage significantly outperforms later ones, contrasting with the U-shaped 'lost-in-the-middle' pattern in text-only models. The findings call for reader-side interventions and question the adequacy of recall@k as a metric for deployed systems.

iGEN Editorial

June 16, 2026

Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

Enterprise AI systems increasingly rely on retrieval-augmented generation (RAG) to answer questions that exceed a model's parametric knowledge. However, a new study introduces a critical finding for organizations deploying multimodal RAG: the order of retrieved information introduces a strong primacy bias, where the first retrieved item disproportionately influences the answer.

The paper, titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' and published on arXiv by Liu, Jieyuan; Gu, Jianyang; Chen, Shijie; Jefferson; Wang; and Zhen, identifies a systematic position effect in knowledge-based visual question answering (KB-VQA). While pure-text long-context LLMs exhibit a U-shaped 'lost-in-the-middle' effect (where information at the start and end of contexts is used, but middle is lost), the multimodal setting flips this pattern to a primacy shape: gold-at-first beats gold-at-last by 16 to 26 points on every reader-by-benchmark cell tested.

Methodology and Key Results

The researchers designed a 'gold-position protocol' to isolate position dependence. They tested three open-source 7B/8B vision-language model (VLM) readers on two KB-VQA benchmarks with up to 20 retrieved passages (k=20). The effect, dubbed 'Lost at the End', was consistent: the first retrieved passage was used significantly more than any other position.

Metric	Text-Only Multimodal Amplification
Primacy gap magnification	2.2 to 4.5 times relative to text-only baseline
Accuracy drop (first vs. last gold passage)	16 to 26 percentage points

Through three targeted ablations, the team narrowed the cause. A text-only control showed that the multimodal setting amplifies an already-present text-mode primacy by 2.2 to 4.5 times. Image-position and distractor-shuffle ablations together pinpointed the locus to prompt slot 0 of the instruction-tuned reader.

Implications for Enterprise AI

The findings have direct implications for any enterprise using multimodal RAG in production—for example, in automated customer support, document analysis, or visual inspection. The researchers note that 'recall@k is the wrong metric for deployed KB-VQA' because even if the correct passage is retrieved, its position dictates its influence. Closing this gap requires reader-side intervention, not just improvements in retrieval quality.

Three retrieval-side fixes were tested on a frozen reader: MMR (Maximal Marginal Relevance), oracle reranking, and rank-based reordering. None produced a separable improvement, leaving the gap intact. The authors conclude that 'closing the gap requires reader-side intervention' and release their protocol as a controlled instrument for evaluating such interventions.

For technology leaders evaluating RAG systems, this research underscores the need to audit not only retrieval accuracy but also how retrieved context is consumed by the reader. Organizations should test for position bias in their specific use cases and consider reader modifications—such as attention masking or re-weighting—to ensure equitable treatment of all retrieved passages.

'Our findings indicate that recall@k is the wrong metric for deployed KB-VQA and that closing the gap requires reader-side intervention,' the authors write.

The study is available under a Creative Commons Attribution 4.0 International license.

Sources:

Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

Methodology and Key Results

Implications for Enterprise AI

Recommended Stories

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

The Scaffold Effect: How Prompt Framing Skews AI Evaluation in Clinical Vision-Language Models

Before the Labels: How Dataset Construction Biases Suicidality Detection in Clinical Text

TreeTracer Visualizes Hidden LLM Bias Through Stochastic Path Aggregation for Enterprise AI Auditing