vision-language-action

6 stories

New Training-Free Method Enables Robots to Follow Personalized Commands Like 'Bring My Cup'

Researchers propose Visual Attentive Prompting (VAP), a training-free perceptual adapter that enables vision-language-action models to follow personalized commands by using reference images as visual prompts. VAP outperforms generic policies and token-learning baselines on simulation and real-world benchmarks.

Jul 8, 2026 1 source

New Training-Free Method Compresses Vision-Language-Action Models by 50% Without Performance Loss

Technology

Artificial Intelligence #finetuning#vision-language-action

New Training-Free Method Compresses Vision-Language-Action Models by 50% Without Performance Loss

A research team led by Gia-Binh Ho et al. discovered that Vision-Language-Action (VLA) models exhibit severe layer-wise redundancy. They introduced a training-free compression pipeline using Centered Kernel Alignment to remove twin layers, achieving up to 50% depth reduction, 40-50% faster fine-tuning, and 30% faster inference while matching or exceeding full-scale performance.

Jun 20, 2026 1 source

New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

Technology

Artificial Intelligence #vision-language-action#occlusion

New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

Researchers introduced LIBERO-Occ, an occlusion-oriented benchmark for Vision-Language-Action (VLA) models, and proposed Viewpoint Imagination (VIM), a method that generates a complementary view from an occluded primary observation to condition action prediction. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion, and VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment.

Jun 16, 2026 1 source

ResVLA Anchors Generative Policies with Residual Bridges to Reduce Noise and Speed Robot Learning

Technology

Artificial Intelligence #generative vla#robotics

ResVLA Anchors Generative Policies with Residual Bridges to Reduce Noise and Speed Robot Learning

A team of researchers proposes ResVLA, a new architecture for generative Vision-Language-Action (VLA) policies that replaces the standard 'generation-from-noise' paradigm with a 'refinement-from-intent' approach. By using spectral analysis to separate robot motion into a deterministic low-frequency intent anchor and a stochastic high-frequency residual, the model achieves faster convergence, stronger robustness to perturbations, and competitive performance in both simulated and real-world robot experiments.

Jun 16, 2026 1 source

FineVLA Framework Improves Robot Instruction Following by 62.7% in Real-World Dual-Arm Manipulation

Technology

Artificial Intelligence #vision-language-action#robotics

FineVLA Framework Improves Robot Instruction Following by 62.7% in Real-World Dual-Arm Manipulation

Researchers introduce FineVLA, an open framework for fine-grained instruction alignment in vision-language-action (VLA) robot policies. The framework includes a dataset of 47,159 human-verified trajectories, a benchmark with 500 videos and 11,631 atomic facts, and a steerable policy that improves real-world dual-arm manipulation success from 49.9% (raw-only) to 62.7%.

Jun 16, 2026 2 sources

X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST

Technology

Artificial Intelligence #x-tokenizer#multimodal

X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST

Researchers propose X-Tokenizer, a new action tokenizer that treats tokenization as semantic interface learning rather than mere compression. Using a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture, it improves multimodal grounding by 13.5% and long-horizon task performance by 8.25 points over existing methods like FAST.

Jun 16, 2026 1 source