Enterprise AI systems increasingly rely on retrieval-augmented generation (RAG) to answer questions that exceed a model's parametric knowledge. However, a new study introduces a critical finding for organizations deploying multimodal RAG: the order of retrieved information introduces a strong primacy bias, where the first retrieved item disproportionately influences the answer.
The paper, titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' and published on arXiv by Liu, Jieyuan; Gu, Jianyang; Chen, Shijie; Jefferson; Wang; and Zhen, identifies a systematic position effect in knowledge-based visual question answering (KB-VQA). While pure-text long-context LLMs exhibit a U-shaped 'lost-in-the-middle' effect (where information at the start and end of contexts is used, but middle is lost), the multimodal setting flips this pattern to a primacy shape: gold-at-first beats gold-at-last by 16 to 26 points on every reader-by-benchmark cell tested.
Methodology and Key Results
The researchers designed a 'gold-position protocol' to isolate position dependence. They tested three open-source 7B/8B vision-language model (VLM) readers on two KB-VQA benchmarks with up to 20 retrieved passages (k=20). The effect, dubbed 'Lost at the End', was consistent: the first retrieved passage was used significantly more than any other position.
| Metric | Text-Only Multimodal Amplification |
|---|---|
| Primacy gap magnification | 2.2 to 4.5 times relative to text-only baseline |
| Accuracy drop (first vs. last gold passage) | 16 to 26 percentage points |
Through three targeted ablations, the team narrowed the cause. A text-only control showed that the multimodal setting amplifies an already-present text-mode primacy by 2.2 to 4.5 times. Image-position and distractor-shuffle ablations together pinpointed the locus to prompt slot 0 of the instruction-tuned reader.
Implications for Enterprise AI
The findings have direct implications for any enterprise using multimodal RAG in production—for example, in automated customer support, document analysis, or visual inspection. The researchers note that 'recall@k is the wrong metric for deployed KB-VQA' because even if the correct passage is retrieved, its position dictates its influence. Closing this gap requires reader-side intervention, not just improvements in retrieval quality.
Three retrieval-side fixes were tested on a frozen reader: MMR (Maximal Marginal Relevance), oracle reranking, and rank-based reordering. None produced a separable improvement, leaving the gap intact. The authors conclude that 'closing the gap requires reader-side intervention' and release their protocol as a controlled instrument for evaluating such interventions.
For technology leaders evaluating RAG systems, this research underscores the need to audit not only retrieval accuracy but also how retrieved context is consumed by the reader. Organizations should test for position bias in their specific use cases and consider reader modifications—such as attention masking or re-weighting—to ensure equitable treatment of all retrieved passages.
'Our findings indicate that recall@k is the wrong metric for deployed KB-VQA and that closing the gap requires reader-side intervention,' the authors write.
The study is available under a Creative Commons Attribution 4.0 International license.