iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
MatchLM2Lite: Scalable MLLM-to-Lite Framework for Reproduced Content Identification AIChilles Automatically Unearths Hidden Weaknesses in AI-Evolved Programs Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity Unassigned Agents in Multi-Agent Path Finding Addressed by Compilation-Based Solvers New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes MatchLM2Lite: Scalable MLLM-to-Lite Framework for Reproduced Content Identification AIChilles Automatically Unearths Hidden Weaknesses in AI-Evolved Programs Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming RAG and LLMs Combined to Generate Personalized Reading Content at Desired Complexity Unassigned Agents in Multi-Agent Path Finding Addressed by Compilation-Based Solvers New Framework Reduces Visual Hallucinations in Multimodal AI Systems Without Retraining MAF Framework Dynamically Optimizes Prompting for Multimodal Sentiment Analysis Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment AI Framework Targets 50% Water Loss in Jordan with LLM and Digital Twin Integration AnonShield: Scalable On-Premise Pseudonymization Cuts Vulnerability Data Processing from 92 Hours to Under 10 Minutes
Home ›› Technology ›› Ai ›› Computer Vision ›› EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

Researchers developed EyeMVP, a cross-modal retinal foundation model that enriches color fundus photography (CFP) with depth-resolved information from optical coherence tomography (OCT). Pretrained on 674,893 paired images from 112,642 patients across eight Chinese hospitals, EyeMVP outperforms leading models on 16 downstream tasks including macular edema detection (AUROC 0.948 vs 0.852) and myopic macular schisis (0.825).

iG
iGEN Editorial
June 16, 2026
EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

Color fundus photography (CFP) is the mainstay for large-scale retinal screening, but its diagnostic capacity is constrained by the lack of depth-resolved structural information. Optical coherence tomography (OCT) provides cross-sectional retinal anatomy, yet is less accessible in population-level screening. To bridge this gap, researchers have developed EyeMVP, a cross-modal retinal foundation model that uses paired CFP–OCT pretraining to learn OCT-informed CFP representations, according to a study published on arXiv.

Model Architecture and Pretraining

EyeMVP is pretrained on 674,893 strict same-eye same-day paired CFP–OCT image triples from 112,642 patients across eight hospitals in China. The model employs cross-modal masked reconstruction to enrich CFP representations with OCT-associated supervision, while requiring only CFP images at inference. To accommodate the non-aligned imaging geometry between en-face CFP and cross-sectional OCT, EyeMVP combines source-constrained cross-attention with CFP-derived structural masks.

Performance on Downstream Tasks

Across 16 downstream tasks, including classification, segmentation, few-shot adaptation, and cross-modal retrieval, EyeMVP outperforms representative retinal foundation models. The model shows consistent gains on tasks involving macular and optic nerve structure. For CFP-challenging macular diseases, EyeMVP achieves an AUROC of 0.948 for macular edema (vs. 0.852 for EyeCLIP) and 0.825 for myopic macular schisis.

Task EyeMVP AUROC Comparison Model AUROC
Macular edema 0.948 0.852 (EyeCLIP)
Myopic macular schisis 0.825 Not reported

Comparison with Existing Models

In addition to outperforming EyeCLIP on macular edema, EyeMVP exceeded the performance of other representative retinal foundation models across the 16-task benchmark, according to the study. The architecture's ability to incorporate OCT supervision at the pixel level during pretraining is credited for the improvement.

Reader Study Results

In an exploratory reader study, EyeMVP exceeded junior and intermediate ophthalmologist groups but did not reach senior ophthalmologist performance on macular edema. On myopic macular schisis, EyeMVP showed numerically higher balanced accuracy than all reader groups. These results suggest that pixel-level cross-modal reconstruction can enrich CFP representations with OCT-associated supervision, providing a practical route toward stronger CFP-based retinal analysis in screening settings.

The study demonstrates that AI models can learn depth-resolved information from OCT without requiring OCT at inference time, potentially enabling more accurate large-scale screening programs. For enterprise technology decision-makers evaluating medical imaging AI, the pretraining methodology and performance gains highlight the value of cross-modal learning in resource-constrained environments.


Sources:

Keep Reading

Recommended Stories

Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment Technology

Study on Pedestrian Attribute Recognition Identifies Sparsity Wall and Optimizes Edge Deployment

A new study on pedestrian attribute recognition (PAR) addresses extreme class imbalance in large-scale datasets. Researchers identified the "majority negative class cheating trap" and proposed a calibrated Multi-Label Focal Loss configuration. They also defined the "Sparsity Wall," a boundary where global loss reweighting fails, requiring instance-level intervention.

June 16, 2026
MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings Technology

MoFore: A New Self-Supervised Framework Learns Video Representations by Forecasting Future Latent Embeddings

A new self-supervised video representation learning framework called MoFore (Momentum-Guided Semantic Forecasting) is introduced by researcher Xu Qinwu. Instead of reconstructing masked pixels or aligning contrastive pairs, MoFore learns by forecasting future latent embeddings from temporally distant clips. Experiments on the UCF101 dataset show strong temporal stability and emergent category-level structure without action labels.

June 16, 2026
AI and Deep Learning Transform Cattle Identification for Livestock Supply Chain Security Technology

AI and Deep Learning Transform Cattle Identification for Livestock Supply Chain Security

A systematic review of machine learning and deep learning techniques for cattle identification reveals that deep learning methods like CNNs, ResNets, and YOLO outperform classical approaches in detection and recognition tasks. Key features include muzzle prints and coat patterns, while challenges remain in dataset availability and real-time processing.

June 16, 2026
New Rational Sparse Autoencoder Improves AI Interpretability with Trainable Activation Function Technology

New Rational Sparse Autoencoder Improves AI Interpretability with Trainable Activation Function

Researchers introduce the Rational Sparse Autoencoder (RSAE), which replaces fixed encoder nonlinearities with a trainable rational function. Across three language models and three baseline activation families, RSAE strictly improves reconstruction and downstream-behaviour metrics while preserving feature-level interpretability, adding only a few scalar parameters per autoencoder.

June 16, 2026