iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Explainable deep learning improves human mental models of self-driving cars, study finds SkillsBench Benchmark Measures How Agent Skills Boost LLM Performance Across Diverse Tasks PATCH Monitor Enables Robots to Handle Unexpected Disturbances During Manipulation Tasks Z-Plane Neural Networks Replace ReLU and LayerNorm with Bounded Geometric Activation APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders
Home ›› Technology ›› Ai ›› It’s a Race to Capture Real-World AI Training Data in India’s Unregulated Market

It’s a Race to Capture Real-World AI Training Data in India’s Unregulated Market

Home-services startup Pronto pilots in-home video recordings to train physical AI, spotlighting a fast-growing but loosely regulated industry in India. Startups like Neocambrian AI and Humyn Labs collect first-person video from kitchens, factories, and warehouses to train robots and world models, catering to robotics OEMs and defence firms. The practice raises privacy and consent concerns, with some factories pausing pilots after backlash.

iG
iGEN Editorial
June 15, 2026
It’s a Race to Capture Real-World AI Training Data in India’s Unregulated Market

Indian startups are racing to capture real-world egocentric video data to train physical AI systems, tapping into a loosely regulated industry that feeds the global robotics supply chain, according to a Business Today report.

The Emerging Data-Capture Ecosystem

Home-services startup Pronto recently admitted piloting in-home video recordings to train physical AI systems, shining light on a fast-growing and loosely regulated industry of AI data capture and labelling. Pronto is not alone. Startups such as Human Archive, Humyn Labs, Egolab AI and Neocambrian are collecting egocentric data — first-person video captured through wearables or head-mounted cameras. They partner with cloud kitchens, hotels, home-services platforms, small textile and garment factories, and warehouse operators to record everyday tasks from cooking meals and washing dishes to stitching garments, assembling components and sorting inventory, the report said. In some cases, startups have built dedicated ‘data factories’ with motion-tracking rigs.

How It Works: From Kitchens to Factories

Abhinav Kukreja, founder of Neocambrian AI, which raised funds from angels including Dalmia Family Office Trust, told Business Today: “Typical clients are robotics, vision-language-action model and world model companies.” He added, “There is no equivalent repository of physical behaviour on the internet. Robots need to learn from messy homes, crowded factories, small shops and repair stations, which India offers.” Kukreja noted that when done right, it can become an additional source of paid work for workers and households, and that the company compensates both environment owners and data collectors.

Manish Agarwal, co-founder of Humyn Labs, which works with leading frontier labs, said demand is growing from robotics OEMs, software makers and enterprises. “We collect and convert this into episodic strings for robot memory, which helps build low to mid-level agentic capabilities including physical action, voice, sight and mobility,” he said. The company uses verified networks of workers across 16 countries, as robots cannot be trained only in Indian environments. “For European domestic robotics to navigate better, we need training data similar to that environment,” Agarwal said.

Demand Drivers: Robotics, Defence and World Models

This data trains world models and physical AI systems, teaching robots to navigate and act in messy, unstructured environments and smart glasses for object recognition. One industry insider told Business Today there is significant demand from the defence industry, particularly for autonomous drone applications. The startups argue that this is India’s entry into the global AI value chain, and that working with frontier labs could help the country train competitive models of its own.

Concerns Over Privacy and Consent

The practice also raises questions about privacy, legality and compensation, as in some cases videos are recorded without pay and consent from the workers. Times of India learned that some factories have paused such pilots after the recent backlash.

Madhukar Yarra, CEO of Bengaluru-based NextWealth, which annotates these videos, called it a flash in the pan. Much of the data is collected through unorganised gig work, he said.

Implications for India’s Role in the AI Value Chain

For trade executives, the emergence of a data-capture industry in India signals a new services export opportunity — supplying training data for global robotics and AI firms. However, the lack of regulation on consent, data privacy, and worker compensation could create reputational and legal risks for international buyers. Companies sourcing this data should vet compliance with local and international norms. The tension between cost-arbitrage and long-term value will shape whether India becomes a sustainable hub for physical AI training data or remains a flash in the pan.


Sources: Business-Today

Keep Reading

Recommended Stories

Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models Technology

Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models

A new benchmark from researchers at NC State evaluates five respiratory acoustic foundation models on cough regression tasks—predicting age, BMI, and disease probability from cough audio. The study reveals that smaller MLP heads often outperform linear probes, but full-MLP heads overfit on small clinical data. HeAR and M2D+Resp achieve near-full performance with only 50 samples, while OPERA models require 400. Cross-dataset transfer is asymmetric, with large diverse datasets generalizing better to small clinical populations.

June 16, 2026
Spacex Acquires AI Coding Startup Cursor For $60bn Days After Bumper IPO Technology

Spacex Acquires AI Coding Startup Cursor For $60bn Days After Bumper IPO

Elon Musk's SpaceX agreed to buy AI coding startup Cursor for $60bn days after Cursor's IPO. The deal, announced in April as an option, will close by September. Cursor's technology automates code generation and is used by Stripe, Adobe, and Nvidia. SpaceX aims to combine Cursor's product with its Colossus supercomputer to build leading AI models.

June 16, 2026
Metacognitive Myopia in LLMs: New Framework Reveals Hidden Biases with High-Stakes Implications Technology

Metacognitive Myopia in LLMs: New Framework Reveals Hidden Biases with High-Stakes Implications

Researchers propose metacognitive myopia as a cognitive-ecological framework to explain a range of biases in large language models (LLMs), including reinforcement of stereotypes and flawed decision-making. The framework identifies five specific symptoms and suggests technical approximations of metacognitive monitoring and control to mitigate risks. The study raises significant ethical concerns for deploying LLMs in organizational structures and high-stakes domains such as supply chain and trade.

June 16, 2026
Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities Technology

Researchers Develop Method to Read and Steer Language Models' Internal Value Priorities

A new research paper introduces Constitutional Value Potentials (CVP), a method to read and steer internal value priorities in language models from neural activations. The approach predicts value conflicts with AUROC up to 0.95, generalizes across model scales, and supports intervention to shift trade-offs.

June 16, 2026