Topic
vision-language models
GEASS: Gated Evidence-Adaptive Selective Caption Trust Tackles VLM Hallucination
Vision-language models often hallucinate objects, and feeding them their own captions can actually worsen accuracy. Researchers propose GEASS, a gated evidence-adaptive module that decides per query how much of the caption to trust, improving accuracy across four VLMs on two benchmarks without training or additional parameters.
Prompt-Driven AI Models Enable On-Orbit Spacecraft Inspection Without Retraining
Researchers demonstrate that prompt-driven vision-language models can perform zero-shot instance segmentation of spacecraft components on orbit without modifying onboard weights, enabling post-launch semantic expansion. The approach achieves 0.385 mAP@0.5 on a test set of 129 images of unseen satellites, with strong performance on large structures but challenges on fine-scale appendages. Structured prompts improve accuracy by up to 82% over simple category names.
TimeVista: Researchers Use Vision-Language Models as Judges for Time Series Forecasting Evaluation
Researchers propose using vision-language models (VLMs) as judges for time series forecasting, addressing limitations of traditional point-wise metrics. They introduce TimeVista, a benchmark of 5,563 samples, and show VLMs achieve significantly higher consistency with human preferences than conventional metrics, also assessing Time Series Foundation Models.
Open-Source Binary Tracking Boosts Robot Navigation Accuracy by 22.8% Without Cloud Dependence
BinTrack, a fully open-source spatial-localization agent, enables robots to answer spatial queries without relying on closed-source cloud models. It improves accuracy by up to 22.8% over other open-source implementations and matches GPT-4o on the challenging SpaceLocQA benchmark, with a 1.5x inference speedup. The research also introduces GangnamLoop, a real-world multi-trip dataset collected with a quadruped robot on public streets.