iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Rupee snaps two-day rally, settles 2 paise lower at 94.60 against US dollar Spacex Shares Surge Past Amazon in Market Value After IPO Frenzy; Options Trading Begins Parametric Insurance Emerges as Alternative as Traditional Home Insurance Struggles with Disaster Payouts Travel Disruption Is a Productivity Nightmare – AI Provides the Scalable Solution Microsoft Teams finally rolls out Wi-Fi-based location tracking for workplace check-in Cost of ransomware recovery too high? Here’s how to stop footing the bill CMA CGM Moves to Acquire Aircraft Maintenance Specialist Crystal Aero Solutions Qobuz Gains Subscribers as Artists and Audiophiles Reject Spotify's Model M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics Rupee snaps two-day rally, settles 2 paise lower at 94.60 against US dollar Spacex Shares Surge Past Amazon in Market Value After IPO Frenzy; Options Trading Begins Parametric Insurance Emerges as Alternative as Traditional Home Insurance Struggles with Disaster Payouts Travel Disruption Is a Productivity Nightmare – AI Provides the Scalable Solution Microsoft Teams finally rolls out Wi-Fi-based location tracking for workplace check-in Cost of ransomware recovery too high? Here’s how to stop footing the bill CMA CGM Moves to Acquire Aircraft Maintenance Specialist Crystal Aero Solutions Qobuz Gains Subscribers as Artists and Audiophiles Reject Spotify's Model M*: A Modular, Extensible Serving System for Efficient Multimodal AI Inference New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics
Home ›› Technology ›› Ai ›› Computer Vision ›› New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

Researchers introduced LIBERO-Occ, an occlusion-oriented benchmark for Vision-Language-Action (VLA) models, and proposed Viewpoint Imagination (VIM), a method that generates a complementary view from an occluded primary observation to condition action prediction. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion, and VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment.

iG
iGEN Editorial
June 16, 2026
New Benchmark and Method Address Occlusion in Vision-Language-Action Models for Robotics

Vision-Language-Action (VLA) models have achieved strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. According to the paper "LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination," this assumption often fails in realistic settings, where occlusion makes manipulation partially observable. The authors introduced LIBERO-Occ, an occlusion-oriented extension of the LIBERO benchmark, and Viewpoint Imagination (VIM), a method that generates a complementary view from an occluded primary observation and conditions action prediction on both observed and imagined evidence. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion, and VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment time.

The Occlusion Challenge for Vision-Language-Action Models

VLA models integrate visual perception, language understanding, and action generation for robotic manipulation. Standard benchmarks typically present scenes where task-relevant objects are fully visible, a condition that rarely holds in real-world deployments. The paper identifies scene-induced occlusion as a fundamental challenge for VLA models. In settings such as cluttered bins, shelves, or industrial environments, objects may be partially hidden by other items or by the robot's own gripper. The authors report that state-of-the-art VLAs experience substantial performance degradation when occlusion is present, underscoring the need for robust perception-completion mechanisms.

LIBERO-Occ: A Benchmark for Scene-Induced Occlusion

To systematically evaluate VLA models under occlusion, the researchers created LIBERO-Occ, an occlusion-oriented extension of the existing LIBERO benchmark. This new benchmark introduces various occlusion types and severity levels across multiple manipulation task suites. The paper states that LIBERO-Occ is designed to assess how well VLAs handle partially observable conditions. The benchmark and corresponding code are publicly available, enabling the research community to test and compare occlusion-robust methods.

Viewpoint Imagination (VIM): Technical Overview

The proposed method, Viewpoint Imagination (VIM), addresses occlusion by generating a complementary view from the primary occluded observation. VIM conditions action prediction on both the observed and the imagined evidence, effectively providing the model with a more complete scene understanding. According to the authors, this approach improves robustness across task suites, occlusion types, and severity levels. Importantly, VIM does not require additional cameras at deployment time, meaning it can be applied to existing robotic systems without hardware modifications. The paper suggests that viewpoint imagination is a promising mechanism for perception completion in partially observable manipulation.

Implications for Robotics in Logistics and Supply Chain

Although the experiments are conducted on manipulation benchmarks, the principles of LIBERO-Occ and VIM are directly relevant to robotics in logistics and supply chain environments. Occlusion is a common occurrence in warehouse automation, such as when a robotic arm picks items from cluttered bins or when a mobile robot navigates tightly packed shelves. The ability to generate a complementary view without extra cameras could improve the reliability of automated picking, packing, and sorting operations. The research provides a foundation for developing VLA models that are more resilient to the imperfect viewing conditions typical of industrial settings.

Component Description
LIBERO-Occ Occlusion-oriented extension of LIBERO benchmark for evaluating VLA models under scene-induced occlusion
Viewpoint Imagination (VIM) Generates complementary view from occluded primary observation; conditions action prediction on observed and imagined evidence
Key Result State-of-the-art VLAs suffer substantial degradation under occlusion; VIM improves robustness without additional cameras

The paper's authors — Li, Taishan; Zhang, Jiwen; Wang, Siyuan; Huang, Xuanjing; and Wei, Zhongyu — have released the benchmark and code at this link.


Sources:

Keep Reading

Recommended Stories

MapDream: Task-Driven Map Learning Achieves State-of-the-Art Vision-Language Navigation Technology

MapDream: Task-Driven Map Learning Achieves State-of-the-Art Vision-Language Navigation

Researchers propose MapDream, a framework that learns bird's-eye-view maps directly from navigation objectives rather than hand-crafted reconstruction. The approach achieves state-of-the-art monocular performance on the R2R-CE and RxR-CE benchmarks.

June 16, 2026
ActiveSAM Speeds Open-Vocabulary Segmentation 5.5x, Boosts Accuracy for Noisy-Input Domains Technology

ActiveSAM Speeds Open-Vocabulary Segmentation 5.5x, Boosts Accuracy for Noisy-Input Domains

ActiveSAM is a training-free inference framework that improves the speed-accuracy tradeoff of open-vocabulary semantic segmentation. It achieves up to 5.5x faster inference on large-vocabulary datasets while boosting average mIoU by 1.4 points over the state-of-the-art SegEarth-OV3. The method is robust to image corruption, making it suitable for noisy real-world deployments like autonomous driving.

June 16, 2026
Deep Learning Enables Autonomous Logistics Vehicles to Detect and Pick Load Carriers Technology

Deep Learning Enables Autonomous Logistics Vehicles to Detect and Pick Load Carriers

A research paper presents a deep learning-based framework that uses a convolutional neural network on RGBD images to identify landmarks on load carriers and compute their pose. Experiments show sufficient accuracy for reliable detection in industrial environments, supporting autonomous intralogistics operations.

June 16, 2026
Sensor-Conditioned Representation Learning Uses Scene-Relevant Observation Quotients to Improve Latent Geometry Technology

Sensor-Conditioned Representation Learning Uses Scene-Relevant Observation Quotients to Improve Latent Geometry

Researchers propose a sensor-conditioned representation learning framework using scene-relevant observation quotients. Their OQ-TSAE method, tested on synthetic and real-radar data, improves representation-correctness diagnostics over reconstruction, metric-learning, and contrastive baselines.

June 16, 2026