iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning NordVPN's Private Server Add-On Gives Enterprises Isolated Hardware and Static IP for Secure Remote Access India Soyabean Acreage Seen Rising Up to 10% on High Prices, Weak Monsoon Outlook FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Sub-Quadratic Vision Transformers Cut Self-Attention Cost for Faster Image Captioning NordVPN's Private Server Add-On Gives Enterprises Isolated Hardware and Static IP for Secure Remote Access India Soyabean Acreage Seen Rising Up to 10% on High Prices, Weak Monsoon Outlook FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation DYNA Framework Uses Temporal Knowledge Graphs to Reduce LLM Forgetting Without Retraining RAMS: Resource-Adaptive Model Switching for Embedded Edge Perception Under Load Open-SWE-Traces: 207K Multilingual Trajectories Set New Standard for Autonomous Software Engineering Agents Infant-Inspired Noise Boosts Deep RL Exploration, Research from arXiv Shows Mutual Distillation of Dual Foundation Models Achieves State-of-the-Art PET/CT Segmentation with Only 5 Labeled Cases SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation
Home ›› Topics ›› vision-language models

Topic

vision-language models

4 stories
GEASS: Gated Evidence-Adaptive Selective Caption Trust Tackles VLM Hallucination Technology
Artificial Intelligence #geass#gated evidence-adaptive

GEASS: Gated Evidence-Adaptive Selective Caption Trust Tackles VLM Hallucination

Vision-language models often hallucinate objects, and feeding them their own captions can actually worsen accuracy. Researchers propose GEASS, a gated evidence-adaptive module that decides per query how much of the caption to trust, improving accuracy across four VLMs on two benchmarks without training or additional parameters.

Jun 16, 2026 1 source
Prompt-Driven AI Models Enable On-Orbit Spacecraft Inspection Without Retraining Technology
Artificial Intelligence #vision-language models#spacecraft inspection

Prompt-Driven AI Models Enable On-Orbit Spacecraft Inspection Without Retraining

Researchers demonstrate that prompt-driven vision-language models can perform zero-shot instance segmentation of spacecraft components on orbit without modifying onboard weights, enabling post-launch semantic expansion. The approach achieves 0.385 mAP@0.5 on a test set of 129 images of unseen satellites, with strong performance on large structures but challenges on fine-scale appendages. Structured prompts improve accuracy by up to 82% over simple category names.

Jun 16, 2026 1 source
TimeVista: Researchers Use Vision-Language Models as Judges for Time Series Forecasting Evaluation Technology
Artificial Intelligence #time series forecasting#vision-language models

TimeVista: Researchers Use Vision-Language Models as Judges for Time Series Forecasting Evaluation

Researchers propose using vision-language models (VLMs) as judges for time series forecasting, addressing limitations of traditional point-wise metrics. They introduce TimeVista, a benchmark of 5,563 samples, and show VLMs achieve significantly higher consistency with human preferences than conventional metrics, also assessing Time Series Foundation Models.

Jun 16, 2026 1 source
Open-Source Binary Tracking Boosts Robot Navigation Accuracy by 22.8% Without Cloud Dependence Technology
Artificial Intelligence #binary tracking#spatial qa

Open-Source Binary Tracking Boosts Robot Navigation Accuracy by 22.8% Without Cloud Dependence

BinTrack, a fully open-source spatial-localization agent, enables robots to answer spatial queries without relying on closed-source cloud models. It improves accuracy by up to 22.8% over other open-source implementations and matches GPT-4o on the challenging SpaceLocQA benchmark, with a 1.5x inference speedup. The research also introduces GangnamLoop, a real-world multi-trip dataset collected with a quadruped robot on public streets.

Jun 16, 2026 1 source