Robots deployed in real-world environments must plan motions across diverse scenarios without requiring per-scenario retuning. Current approaches such as end-to-end reinforcement learning can generalize but often become brittle under distribution shift, reward misspecification, and stochastic interactions. Model predictive path integral (MPPI) control enables strong real-time refinement without gradients, yet its performance depends on a well-shaped sampling prior, and manually designing these priors does not scale to multi-scenario deployment, according to a new paper on arXiv.
The Challenge of Multi-Scenario Motion Planning
Traditional motion planning methods often rely on scenario-specific tuning, which is impractical when a robot must operate in varied environments. End-to-end reinforcement learning can adapt but suffers from brittleness. According to the paper authored by Min Youngjae, Jovin D'sa, Faizan M Tariq, David Isele, Navid Azizan, and Sangjae Bae, MPPI control offers real-time optimization but its effectiveness hinges on a carefully designed sampling prior. Manually shaping this prior does not scale to multi-scenario deployment, creating a bottleneck for autonomous systems.
Hierarchical Approach: Offline Learning, Online Optimization
The researchers present HOLO-MPPI (High-level Offline, Low-level Online MPPI), a multi-scenario motion planning framework that combines high-level policy learning with low-level stochastic optimal control. In the offline phase, the system learns a high-level policy that proposes scenario-robust plans in an abstract action space, using a learned world model for online rollout. During online execution, the policy serves as a data-driven prior generator that parameterizes MPPI's sampling distribution, conditioned on the current observation and goal. MPPI then optimizes low-level control sequences around this prior in real time, adapting to local disturbances.
| Feature | End-to-End RL | MPPI (traditional) | HOLO-MPPI |
|---|---|---|---|
| Generalization across scenarios | Moderate (brittle under shift) | Low (manually tuned prior per scenario) | High (learned prior adapts) |
| Real-time control | Yes (inference only) | Yes | Yes |
| Training requirement | Large offline RL | Manual prior design | Offline policy learning + online MPPI |
| Robustness to disturbances | Low | Moderate | High (online optimization around learned prior) |
Autonomous Driving Instantiation and Results
The authors instantiated HOLO-MPPI in autonomous driving by designing an effective high-level action space and tailored model architectures. Their evaluation across diverse driving scenarios showed that HOLO-MPPI improves upon MPPI and end-to-end RL baselines while maintaining real-time control. The framework avoids the brittleness of end-to-end RL and the scalability issue of manually designed priors for MPPI. The paper notes that the high-level policy proposes scenario-robust plans offline, while MPPI refines them online, enabling performance gains in varied conditions.
This research has implications for autonomous systems in logistics, such as warehouse robots and self-driving trucks, where robots must handle unpredictable environments without per-deployment tuning. The combination of offline learning and online optimization offers a path toward scalable, robust motion planning in multi-scenario settings.