EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
The problem: Deformable objects such as elastic materials and fabrics are notoriously difficult for robots to handle because their dynamics are complex and vary with each instance. Traditional physics models often require per-object calibration or struggle to generalize to unseen shapes. A new research framework from computer vision researchers aims to solve this by learning generalizable physics models from egocentric (first-person) video alone.
According to the arXiv paper by Kim, Hyunjin; Qiu, Ri-Zhao; Jiang, Guangqi; and Wang, Xiaolong, humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics remains a major challenge for computer vision and robotics. The authors introduce EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors.
"Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics."
How EgoPhys Works
EgoPhys overcomes limitations of existing methods by distilling per-object inverse-physics solutions into a compact codebook. This enables the framework to predict dense spring stiffness fields for unseen objects without per-spring test-time optimization. The system is trained with generalizable priors from diverse egocentric interactions, allowing it to outperform baselines in three key areas:
- Reconstruction accuracy – matching observed deformations
- Future prediction – forecasting how an object will deform under forces
- Zero-shot generalization – performing well on objects and scenes never seen during training
Training and Dataset
To support training and evaluation, the researchers curated an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. This dataset provides the varied examples needed for the model to learn generalizable physics without requiring ground-truth material properties or physics simulation at test time.
Real-World Robot Deployment
The researchers deployed EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning. This experiment highlights egocentric RGB observations as a scalable path toward real-to-sim pipelines, where a robot can learn from human demonstrations and then simulate possible actions before executing them.
Implications for Enterprise Automation
While the paper is primarily a computer vision and robotics contribution, its implications for supply chain and logistics automation are direct. Robots in warehouses and factories currently excel at handling rigid objects (boxes, pallets) but struggle with soft items such as garments, food products, or medical supplies. A framework like EgoPhys could enable robots to plan manipulation of these deformable items by first constructing a digital twin from a single video — no tedious manual calibration required. The demonstrated use of an xArm6 robot, a common affordable robotic arm, suggests the method could be integrated into existing automation platforms with relatively low overhead.
The researchers note that EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization, indicating robust performance even when presented with new objects or environments. The work is published on arXiv under the identifier 2606.16202 and is available as open-access under a Creative Commons license.