Topic
vision-language-action
Artificial Intelligence #vision-language-action#robotics
FineVLA Framework Improves Robot Instruction Following by 62.7% in Real-World Dual-Arm Manipulation
Researchers introduce FineVLA, an open framework for fine-grained instruction alignment in vision-language-action (VLA) robot policies. The framework includes a dataset of 47,159 human-verified trajectories, a benchmark with 500 videos and 11,631 atomic facts, and a steerable policy that improves real-world dual-arm manipulation success from 49.9% (raw-only) to 62.7%.
Jun 16, 2026 2 sources
Artificial Intelligence #x-tokenizer#multimodal
X-Tokenizer: Semantic Action Tokenizer Boosts Robot Control by 13.5% Over FAST
Researchers propose X-Tokenizer, a new action tokenizer that treats tokenization as semantic interface learning rather than mere compression. Using a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture, it improves multimodal grounding by 13.5% and long-horizon task performance by 8.25 points over existing methods like FAST.
Jun 16, 2026 1 source