Artificial Intelligence #semantic gradients#mllms
SAGA Framework Uses Frozen MLLMs to Boost Visual Embedding Recall by 3-6 Points
Researchers propose SAGA, a framework that converts frozen MLLMs into attribute-aware training signals for vision encoders, replacing uniform scalar distances with semantic gradients. Using Group Relative Policy Optimization (GRPO) and attention distillation, SAGA improves zero-shot image retrieval Recall@1 by 3 to 6 points on benchmark datasets.
Jun 16, 2026 1 source