FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation

Researchers introduce FlowMPC, a framework that pairs imitation-learned flow matching policies with a learned world model for test-time planning using MPPI. On ManiSkill manipulation tasks PickCube and PickSingleYCB, adding the world model improved performance over the flow matching policy alone, with clear gains in end-of-episode success.

iGEN Editorial

June 16, 2026

FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation

A key limitation of behavior cloning is that it learns to mimic expert demonstrations without directly optimizing for task success. Flow Matching (FM), a powerful technique for behavior cloning in multimodal action spaces, suffers from this same shortcoming. A new research paper introduces FlowMPC, a framework that augments an FM policy with a learned world model to enable test-time planning, boosting performance on manipulation tasks.

The work, posted on arXiv by researchers Hamel and Chandon, builds on the TD-MPC2 model-based reinforcement learning algorithm. It investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy.

The Challenge with Flow Matching

Flow Matching is an imitation learning method that learns to generate actions by transforming a simple noise distribution into the distribution of expert actions. According to the paper, it has been effective for behavior cloning in complex, multimodal action spaces. However, because FM policies are not trained to maximize expected return, there is room to improve their performance at test time.

Introducing FlowMPC

FlowMPC combines an imitation-learned FM policy with a learned world model. The world model acts as a simulator, predicting the outcomes of potential action sequences. At test time, the FM policy proposes candidate action trajectories, and MPPI uses the world model to evaluate and select the best sequence. This approach allows the system to plan ahead without modifying the FM training objective.

The framework builds directly on TD-MPC2 (Hansen et al., 2024), a state-of-the-art model-based reinforcement learning method. The authors note that the world model is used only during inference, leaving the FM training procedure unchanged.

Results on Manipulation Benchmarks

The researchers evaluated FlowMPC on two tasks from the ManiSkill manipulation benchmark (Tao et al., 2025): PickCube and PickSingleYCB. Across both tasks, adding the world model improved performance over the FM policy alone. The gains were especially clear in end-of-episode success rates, indicating that planning helps the policy complete tasks more reliably.

Task	FM Policy Only	FlowMPC (FM + World Model)
PickCube	Lower success	Higher success (clear gains)
PickSingleYCB	Lower success	Higher success (clear gains)
Note: Exact numerical results are not provided in the paper; performance improvement is reported qualitatively.

Implications for AI-Powered Systems

While the experiments focus on simulated robot manipulation, the underlying approach—augmenting imitation-learned policies with model-based planning—has broader relevance. For enterprise systems that rely on behavior cloning, such as automated assembly or logistics handling, FlowMPC demonstrates that world models can provide a practical performance boost without retraining the policy. The framework's reliance on TD-MPC2 and MPPI means it can integrate with existing model-based reinforcement learning tools.

According to the paper, these results suggest that world-model-based planning can effectively complement flow-based imitation policies. The ability to improve policy performance at test time could reduce the need for extensive retraining when environments change—a valuable property for deployment in dynamic real-world settings.

The paper is available on arXiv under a Creative Commons license, with code and data expected to be released through associated links.

Sources:

FlowMPC: New Framework Combines Flow Matching and World Models to Improve Robot Manipulation

The Challenge with Flow Matching

Introducing FlowMPC

Results on Manipulation Benchmarks

Implications for AI-Powered Systems

Recommended Stories

BridgePolicy: New Diffusion Bridge Method Improves Visuomotor Policy Learning in Robotics

New Survey Unifies LLM Policy Optimization Methods on First Principles from REINFORCE to GRPO

Reinforcement Learning Foundation Models: Synthetic MDPs Could Bridge the Gap

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation