iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities Freight Distress Report: More Carriers Shut Down, Logistics Firms Cut Jobs Across US New MBABench Evaluates LLM Agents on End-to-End Finance Spreadsheet Tasks Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data Multi-Agent Peer-Reviewed Reasoning Boosts LLM Accuracy in Medical Question Answering Europe needs 65 CO2 carriers and 33 ports by 2050 to meet carbon storage goals, Xodus report says LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance The Missing Knowledge Layer in Cognitive Architectures for AI Agents RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities Freight Distress Report: More Carriers Shut Down, Logistics Firms Cut Jobs Across US New MBABench Evaluates LLM Agents on End-to-End Finance Spreadsheet Tasks Multi-Sensor Fusion Technique Enhances UAV Classification Accuracy Using Image and Radar Data Multi-Agent Peer-Reviewed Reasoning Boosts LLM Accuracy in Medical Question Answering Europe needs 65 CO2 carriers and 33 ports by 2050 to meet carbon storage goals, Xodus report says LLMs Struggle with Multi-Step Logic: New Framework DREAM Boosts Theorem Proving Performance The Missing Knowledge Layer in Cognitive Architectures for AI Agents RealityBridge: New AI Framework Edits 3D Driving Simulations to Close the Sim-to-Real Gap
Home ›› Technology ›› Ai ›› Ai Ethics ›› Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

A research paper titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' introduces a controlled probe to measure position bias in multimodal KB-VQA. The study finds a strong primacy effect, where the first retrieved passage significantly outperforms later ones, contrasting with the U-shaped 'lost-in-the-middle' pattern in text-only models. The findings call for reader-side interventions and question the adequacy of recall@k as a metric for deployed systems.

iG
iGEN Editorial
June 16, 2026
Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds

Enterprise AI systems increasingly rely on retrieval-augmented generation (RAG) to answer questions that exceed a model's parametric knowledge. However, a new study introduces a critical finding for organizations deploying multimodal RAG: the order of retrieved information introduces a strong primacy bias, where the first retrieved item disproportionately influences the answer.

The paper, titled 'Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering' and published on arXiv by Liu, Jieyuan; Gu, Jianyang; Chen, Shijie; Jefferson; Wang; and Zhen, identifies a systematic position effect in knowledge-based visual question answering (KB-VQA). While pure-text long-context LLMs exhibit a U-shaped 'lost-in-the-middle' effect (where information at the start and end of contexts is used, but middle is lost), the multimodal setting flips this pattern to a primacy shape: gold-at-first beats gold-at-last by 16 to 26 points on every reader-by-benchmark cell tested.

Methodology and Key Results

The researchers designed a 'gold-position protocol' to isolate position dependence. They tested three open-source 7B/8B vision-language model (VLM) readers on two KB-VQA benchmarks with up to 20 retrieved passages (k=20). The effect, dubbed 'Lost at the End', was consistent: the first retrieved passage was used significantly more than any other position.

Metric Text-Only Multimodal Amplification
Primacy gap magnification 2.2 to 4.5 times relative to text-only baseline
Accuracy drop (first vs. last gold passage) 16 to 26 percentage points

Through three targeted ablations, the team narrowed the cause. A text-only control showed that the multimodal setting amplifies an already-present text-mode primacy by 2.2 to 4.5 times. Image-position and distractor-shuffle ablations together pinpointed the locus to prompt slot 0 of the instruction-tuned reader.

Implications for Enterprise AI

The findings have direct implications for any enterprise using multimodal RAG in production—for example, in automated customer support, document analysis, or visual inspection. The researchers note that 'recall@k is the wrong metric for deployed KB-VQA' because even if the correct passage is retrieved, its position dictates its influence. Closing this gap requires reader-side intervention, not just improvements in retrieval quality.

Three retrieval-side fixes were tested on a frozen reader: MMR (Maximal Marginal Relevance), oracle reranking, and rank-based reordering. None produced a separable improvement, leaving the gap intact. The authors conclude that 'closing the gap requires reader-side intervention' and release their protocol as a controlled instrument for evaluating such interventions.

For technology leaders evaluating RAG systems, this research underscores the need to audit not only retrieval accuracy but also how retrieved context is consumed by the reader. Organizations should test for position bias in their specific use cases and consider reader modifications—such as attention masking or re-weighting—to ensure equitable treatment of all retrieved passages.

'Our findings indicate that recall@k is the wrong metric for deployed KB-VQA and that closing the gap requires reader-side intervention,' the authors write.

The study is available under a Creative Commons Attribution 4.0 International license.


Sources:

Keep Reading

Recommended Stories

MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Technology

MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis

A new research paper proposes the Multimodal Adaptive Few-Shot Prompting (MAF) framework, which improves sentiment analysis in multimodal large language models (MLLMs) by dynamically retrieving and integrating query-relevant demonstrations. The method uses a lightweight coefficient network to fuse multimodal similarity scores and enhances prediction stability via majority voting.

June 16, 2026
X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST Technology

X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST

Researchers propose X-Tokenizer, a new action tokenizer that treats tokenization as semantic interface learning rather than mere compression. Using a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture, it improves multimodal grounding by 13.5% and long-horizon task performance by 8.25 points over existing methods like FAST.

June 16, 2026
Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Technology

Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention

Researchers propose the Controlled Dynamics Attractor Transformer (CDAT), which integrates a mixture von Mises-Fisher attention energy with Hopfield refinement and excitation-inhibition modulation from neural attractor models. The model achieves state-of-the-art results on graph anomaly detection and classification benchmarks, offering potential for detecting fraud, cyber threats, and operational anomalies in supply chain networks.

June 16, 2026
Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation Technology

Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation

Researchers propose an audio-only dual-process pipeline for multiparty turn-taking, using a fast trigger and lightweight verifier. Diffusion-based background-audio mixing as data augmentation improves shift detection on the VoxConverse dataset.

June 16, 2026