It’s a Race to Capture Real-World AI Training Data in India’s Unregulated Market

Home-services startup Pronto pilots in-home video recordings to train physical AI, spotlighting a fast-growing but loosely regulated industry in India. Startups like Neocambrian AI and Humyn Labs collect first-person video from kitchens, factories, and warehouses to train robots and world models, catering to robotics OEMs and defence firms. The practice raises privacy and consent concerns, with some factories pausing pilots after backlash.

iGEN Editorial

June 15, 2026

Indian startups are racing to capture real-world egocentric video data to train physical AI systems, tapping into a loosely regulated industry that feeds the global robotics supply chain, according to a Business Today report.

The Emerging Data-Capture Ecosystem

Home-services startup Pronto recently admitted piloting in-home video recordings to train physical AI systems, shining light on a fast-growing and loosely regulated industry of AI data capture and labelling. Pronto is not alone. Startups such as Human Archive, Humyn Labs, Egolab AI and Neocambrian are collecting egocentric data — first-person video captured through wearables or head-mounted cameras. They partner with cloud kitchens, hotels, home-services platforms, small textile and garment factories, and warehouse operators to record everyday tasks from cooking meals and washing dishes to stitching garments, assembling components and sorting inventory, the report said. In some cases, startups have built dedicated ‘data factories’ with motion-tracking rigs.

How It Works: From Kitchens to Factories

Abhinav Kukreja, founder of Neocambrian AI, which raised funds from angels including Dalmia Family Office Trust, told Business Today: “Typical clients are robotics, vision-language-action model and world model companies.” He added, “There is no equivalent repository of physical behaviour on the internet. Robots need to learn from messy homes, crowded factories, small shops and repair stations, which India offers.” Kukreja noted that when done right, it can become an additional source of paid work for workers and households, and that the company compensates both environment owners and data collectors.

Manish Agarwal, co-founder of Humyn Labs, which works with leading frontier labs, said demand is growing from robotics OEMs, software makers and enterprises. “We collect and convert this into episodic strings for robot memory, which helps build low to mid-level agentic capabilities including physical action, voice, sight and mobility,” he said. The company uses verified networks of workers across 16 countries, as robots cannot be trained only in Indian environments. “For European domestic robotics to navigate better, we need training data similar to that environment,” Agarwal said.

Demand Drivers: Robotics, Defence and World Models

This data trains world models and physical AI systems, teaching robots to navigate and act in messy, unstructured environments and smart glasses for object recognition. One industry insider told Business Today there is significant demand from the defence industry, particularly for autonomous drone applications. The startups argue that this is India’s entry into the global AI value chain, and that working with frontier labs could help the country train competitive models of its own.

Concerns Over Privacy and Consent

The practice also raises questions about privacy, legality and compensation, as in some cases videos are recorded without pay and consent from the workers. Times of India learned that some factories have paused such pilots after the recent backlash.

Madhukar Yarra, CEO of Bengaluru-based NextWealth, which annotates these videos, called it a flash in the pan. Much of the data is collected through unorganised gig work, he said.

Implications for India’s Role in the AI Value Chain

For trade executives, the emergence of a data-capture industry in India signals a new services export opportunity — supplying training data for global robotics and AI firms. However, the lack of regulation on consent, data privacy, and worker compensation could create reputational and legal risks for international buyers. Companies sourcing this data should vet compliance with local and international norms. The tension between cost-arbitrage and long-term value will shape whether India becomes a sustainable hub for physical AI training data or remains a flash in the pan.

Sources:

It’s a Race to Capture Real-World AI Training Data in India’s Unregulated Market

The Emerging Data-Capture Ecosystem

How It Works: From Kitchens to Factories

Demand Drivers: Robotics, Defence and World Models

Concerns Over Privacy and Consent

Implications for India’s Role in the AI Value Chain

Recommended Stories

Beijing Accuses US AI Firms of Using Chinese Models for Training

Hugging Face CEO demands AI firms answer for rogue bot attacks

Chinese AI Researchers Are Finding Their Voice on X

AI Slop Melodramas on X Exploit Revenue Sharing, Creators Cash In