Autonomous driving development hinges on the ability to simulate rare, dangerous scenarios—known as long-tail hazards—at scale. However, collecting real-world footage of such events is dangerous and costly. Editable 3D Gaussian Splatting (3DGS) offers a way to reconstruct real driving scenes and then apply controlled edits, but the resulting videos suffer from a significant Sim-to-Real gap: rendering artifacts, degraded foreground assets, inconsistent lighting, and temporal flickering. According to a paper on arXiv, existing restoration and video generation methods fail to jointly repair these 3DGS-specific issues, improve visual realism, and maintain temporal coherence. To fill this gap, the authors propose RealityBridge, a structure-preserving and asset-aware Sim-to-Real framework for edited 3DGS driving videos.
The Problem: Sim-to-Real Gap in Driving Simulations
Long-tail hazardous scenarios are essential for safety-oriented autonomous driving, yet they are difficult to collect and reproduce at scale, the paper reports. Editable 3DGS simulation is a promising alternative: it reconstructs real scenes and allows controllable editing. However, the rendered videos from edited 3DGS contain specific artifacts that degrade realism. Existing video restoration and generation methods are insufficient because they cannot simultaneously address 3DGS-specific artifacts, improve overall visual quality, and ensure frame-to-frame consistency.
RealityBridge: Multimodal Controls and Adaptive Allocation
RealityBridge uses multimodal controls to guide the restoration process. According to the paper, these controls include rendered videos, foreground masks, edge maps, and semantic masks. A lightweight GateNet is introduced to adaptively allocate these conditions across backbone layers, ensuring the model focuses on the most relevant information for each frame. This design allows the framework to preserve structure while improving asset quality and illumination consistency.
Technical Approach: Autoregressive Training and Reward-Guided Post-Training
The authors constructed targeted training data and introduced autoregressive long-video training combined with reward-guided post-training. This two-step process improves restoration quality, temporal stability, and reduces hallucination—where the model invents incorrect details. The autoregressive training enables the model to maintain consistency across long video sequences, a critical requirement for driving simulations. Reward-guided post-training further refines outputs by optimizing for perceptual quality metrics.
Performance and Results
Extensive experiments were conducted on both internal and public driving datasets. RealityBridge outperformed existing methods in three key areas:
| Metric | RealityBridge Performance vs. Existing Methods |
|---|---|
| Artifact removal | Superior removal of rendering artifacts |
| Illumination harmonization | More consistent lighting that matches real-world conditions |
| Long-sequence temporal consistency | Reduced flickering and better frame-to-frame coherence |
The paper states that RealityBridge demonstrates superior results in these areas, though specific numerical metrics are not detailed in the provided abstract.
Implications for Autonomous Vehicle Development
For enterprise technology leaders evaluating autonomous driving systems—whether for logistics fleets or passenger vehicles—the ability to generate realistic, editable driving simulations is a force multiplier. RealityBridge addresses a key bottleneck: generating high-fidelity video of rare events without requiring dangerous real-world data collection. By bridging the Sim-to-Real gap, it enables more robust validation of perception and planning algorithms. The framework's use of multimodal controls and lightweight neural components suggests it could be integrated into existing simulation pipelines with manageable computational overhead.
| Feature | Benefit |
|---|---|
| Multimodal controls (masks, edges, semantics) | Provide structural guidance to preserve scene layout |
| GateNet for adaptive condition allocation | Ensures efficient use of computational resources |
| Autoregressive long-video training | Maintains temporal consistency over extended sequences |
| Reward-guided post-training | Reduces hallucination and improves perceptual quality |
While the paper focuses on driving datasets, the underlying approach—editing neural radiance fields and then restoring realism—has potential applications beyond autonomous driving, including robotics simulation and virtual training environments.