iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
FusionRS Dataset Advances Dual-Modal Vision-Language AI for Remote Sensing CAP Achieves 87.6% Improvement in Respiratory Rate Prediction via Patient-Level PPG Learning LLM-WikiRace Benchmark Reveals Frontier AI Models Still Struggle with Planning Over Knowledge Graphs New Research Demystifies Variance in Circuit Discovery of Large Language Models PISA Memory System Draws on Cognitive Psychology to Boost AI Agent Adaptability New Multi-Scale Two-Stream Framework Aims to Decouple Semantics from Distortions in AI-Generated Image Quality Assessment P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics FusionRS Dataset Advances Dual-Modal Vision-Language AI for Remote Sensing CAP Achieves 87.6% Improvement in Respiratory Rate Prediction via Patient-Level PPG Learning LLM-WikiRace Benchmark Reveals Frontier AI Models Still Struggle with Planning Over Knowledge Graphs New Research Demystifies Variance in Circuit Discovery of Large Language Models PISA Memory System Draws on Cognitive Psychology to Boost AI Agent Adaptability New Multi-Scale Two-Stream Framework Aims to Decouple Semantics from Distortions in AI-Generated Image Quality Assessment P3B3 Benchmark Reveals Strong Brazilian Portuguese Bias in Large Language Models Controlled Dynamics Attractor Transformer: New Model Targets Graph Anomaly Detection with Biologically Plausible Attention Tamil Nadu OE Spinning Mills Threaten 50% Production Cut Over High Cotton Waste Prices BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics
Home ›› Technology ›› Ai ›› Llms ›› Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models

Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models

Researchers propose Deep Visual Residual MLLM (Deep-VRM), a method that injects low-level artifact signals into multimodal large language models without disrupting pre-trained semantic knowledge. The approach achieves state-of-the-art detection of AI-generated images across multiple benchmarks.

iG
iGEN Editorial
June 16, 2026
Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models

As AI-generated images become increasingly realistic, traditional semantic-level inconsistency checks are no longer sufficient for reliable detection. A new research paper from a team of computer scientists introduces a method called Deep Visual Residual MLLM (Deep-VRM) that enables multimodal large language models (MLLMs) to capture full-spectrum forensic signals—including low-level generator artifacts—while retaining their pre-trained semantic understanding.

The work, posted on arXiv under the title "Deep Residual Injection for Full-Spectrum Forensic Signal Perception in Multimodal Large Language Models," addresses a critical limitation: fine-tuning MLLMs for artifact learning typically disrupts the semantic representations formed in the models' early-to-middle layers. The authors—Kaiqing Lin, Zhiyuan Yan, Ruoxin Chen, Ke-Yue Zhang, Piao Zhou, Caiyong, Bin, Taiping Yao, Bo Wang, Youchang Xiao, and Shouhong Ding—conducted a layer-wise analysis of forensic signal perception in MLLMs and found that semantic information is primarily formed in the early-to-middle layers, whereas direct fine-tuning for artifact learning disrupts these semantic representations.

The Challenge of Full-Spectrum Perception

MLLMs have been increasingly adopted in forensics due to their robust semantic understanding. However, as AI-generated images become more realistic, relying solely on semantic-level inconsistencies is often insufficient. The researchers pose a critical question: whether MLLMs can achieve full-spectrum forensic signal perception—capturing low-level generator artifacts without sacrificing pre-trained semantic knowledge.

Deep Residual Injection Method

To solve this, the team proposes Deep-VRM. The architecture preserves early semantic processing while injecting artifact-specific visual signals as a residual path into an intermediate layer. These artifact signals are then fused with semantic token representations and propagated through subsequent trainable layers. This design enables later layers to jointly model semantic reasoning and signal-level forensic cues. Surprisingly, the model learns to adaptively leverage different levels of forensic signals depending on the input, achieving robust and generalizable detection performance.

"Semantic information is primarily formed in the early-to-middle layers, whereas direct fine-tuning for artifact learning disrupts these semantic representations."

Experimental Results

The paper reports extensive experiments showing that Deep-VRM achieves state-of-the-art results across most benchmarks. The code and data are available alongside the arXiv publication under a CC BY 4.0 license.

Implications for Enterprise AI

For enterprise technology leaders deploying MLLMs in document verification, fraud detection, or content moderation, the ability to detect AI-generated images without compromising semantic performance is crucial. Deep-VRM offers a method to enhance forensic capabilities while maintaining the model's general intelligence, potentially reducing error rates in automated inspection and validation processes. Although the paper focuses on image forensics, the residual injection technique could be adapted to other modalities and domains where low-level signals need to be preserved alongside high-level understanding.


Sources:

Keep Reading

Recommended Stories

RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Technology

RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load

Researchers present RAMS, a runtime controller that monitors device pressure and dynamically selects among three YOLOv8 tiers on embedded hardware, achieving up to 5.6x faster inference than a fixed medium model while retaining 74% of its accuracy. The system introduces a detection-conditioned switching policy and a new scalar metric, SWAS, for offline policy comparison.

June 16, 2026
Service-Induced Congestion Threatens LLM Serving Throughput, New Model Shows Technology

Service-Induced Congestion Threatens LLM Serving Throughput, New Model Shows

A new mathematical model from researchers at MIT and elsewhere shows that in large language model serving, persistent GPU memory consumption from key-value caches creates a 'service-induced congestion' effect. Under high concurrency, this can lead to instability and throughput losses as high as 50%. The paper identifies scheduling design principles to avoid these losses.

June 16, 2026
UrbanWell Benchmark Puts Multimodal LLMs to Test on Spatio-Temporal Urban Wellbeing Analytics Technology

UrbanWell Benchmark Puts Multimodal LLMs to Test on Spatio-Temporal Urban Wellbeing Analytics

Researchers introduce UrbanWell, a large-scale benchmark for evaluating multimodal large language models on spatio-temporal urban wellbeing analytics. The benchmark covers 38 cities, multiple years, and diverse indicators including environment, accessibility, urban form, vitality, and subjective perception. Testing 15 state-of-the-art MLLMs in zero-shot settings reveals substantial performance variations across heterogeneous indicators.

June 16, 2026
DH-V2: Geometry-Based Sampler Achieves 1,433x Compression for Edge Perception Technology

DH-V2: Geometry-Based Sampler Achieves 1,433x Compression for Edge Perception

Researchers present Double-Helix Vision (DH-V2), a geometry-based visual sampler that compresses 2D images into compact 1D signals using golden-ratio-inspired spiral trajectories. At 4K resolution, it achieves a 1,433x compression ratio while running in 0.52ms on CPU-only hardware, and includes a JSON-serializable Robotics API for bandwidth-constrained perception.

June 16, 2026