Artificial Intelligence #visual reasoning#multimodal
Visual-Seeker: Visual-Native AI Agent for Active Visual Reasoning in Multimodal Search
Researchers propose Visual-Seeker, a visual-native multimodal deep search agent that actively harvests fine-grained visual evidence during search. Using a synthesized dataset of 5K multimodal trajectories, it achieves state-of-the-art on five benchmarks, outperforming several proprietary models.
Jun 16, 2026 1 source