EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

Researchers developed EyeMVP, a cross-modal retinal foundation model that enriches color fundus photography (CFP) with depth-resolved information from optical coherence tomography (OCT). Pretrained on 674,893 paired images from 112,642 patients across eight Chinese hospitals, EyeMVP outperforms leading models on 16 downstream tasks including macular edema detection (AUROC 0.948 vs 0.852) and myopic macular schisis (0.825).

iGEN Editorial

June 16, 2026

EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

Color fundus photography (CFP) is the mainstay for large-scale retinal screening, but its diagnostic capacity is constrained by the lack of depth-resolved structural information. Optical coherence tomography (OCT) provides cross-sectional retinal anatomy, yet is less accessible in population-level screening. To bridge this gap, researchers have developed EyeMVP, a cross-modal retinal foundation model that uses paired CFP–OCT pretraining to learn OCT-informed CFP representations, according to a study published on arXiv.

Model Architecture and Pretraining

EyeMVP is pretrained on 674,893 strict same-eye same-day paired CFP–OCT image triples from 112,642 patients across eight hospitals in China. The model employs cross-modal masked reconstruction to enrich CFP representations with OCT-associated supervision, while requiring only CFP images at inference. To accommodate the non-aligned imaging geometry between en-face CFP and cross-sectional OCT, EyeMVP combines source-constrained cross-attention with CFP-derived structural masks.

Performance on Downstream Tasks

Across 16 downstream tasks, including classification, segmentation, few-shot adaptation, and cross-modal retrieval, EyeMVP outperforms representative retinal foundation models. The model shows consistent gains on tasks involving macular and optic nerve structure. For CFP-challenging macular diseases, EyeMVP achieves an AUROC of 0.948 for macular edema (vs. 0.852 for EyeCLIP) and 0.825 for myopic macular schisis.

Task	EyeMVP AUROC	Comparison Model AUROC
Macular edema	0.948	0.852 (EyeCLIP)
Myopic macular schisis	0.825	Not reported

Comparison with Existing Models

In addition to outperforming EyeCLIP on macular edema, EyeMVP exceeded the performance of other representative retinal foundation models across the 16-task benchmark, according to the study. The architecture's ability to incorporate OCT supervision at the pixel level during pretraining is credited for the improvement.

Reader Study Results

In an exploratory reader study, EyeMVP exceeded junior and intermediate ophthalmologist groups but did not reach senior ophthalmologist performance on macular edema. On myopic macular schisis, EyeMVP showed numerically higher balanced accuracy than all reader groups. These results suggest that pixel-level cross-modal reconstruction can enrich CFP representations with OCT-associated supervision, providing a practical route toward stronger CFP-based retinal analysis in screening settings.

The study demonstrates that AI models can learn depth-resolved information from OCT without requiring OCT at inference time, potentially enabling more accurate large-scale screening programs. For enterprise technology decision-makers evaluating medical imaging AI, the pretraining methodology and performance gains highlight the value of cross-modal learning in resource-constrained environments.

Sources:

EyeMVP AI Model Enhances Retinal Screening by Learning OCT Insights from Fundus Photos

Model Architecture and Pretraining

Performance on Downstream Tasks

Comparison with Existing Models

Reader Study Results

Recommended Stories

New Framework GeoVR Learns 3D Spatial Intelligence from 2D Videos for Multimodal LLMs

Controlled Benchmark Finds No Quantum Advantage in Brain MRI Data Augmentation

BrainG3N Tokenizer Enables Controllable 3D Brain MRI Generation with Clinical-Grade Embeddings

New AI Framework Synthesizes Fluorescein Angiography from Fundus and Sparse OCT Scans