What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

Key Points

DriveX significantly improves 3D future point cloud prediction over previous models, enhancing overall autonomous driving performance.
The model leverages a self-supervised approach to learn holistic representations from large-scale driving videos, capturing comprehensive scene evolution.
DriveX's Future Spatial Attention aggregates spatiotemporal features, facilitating better task-specific inference in dynamic environments.
Extensive experiments validate DriveX's effectiveness across various downstream tasks, including occupancy prediction and flow estimation.

Abstract

Data-driven learning has advanced autonomous driving, yet task-specific models struggle with out-of-distribution scenarios due to their narrow optimization objectives and reliance on costly annotated data. We present DriveX, a self-supervised world model that learns generalizable scene dynamics and holistic representations (geometric, semantic, and motion) from large-scale driving videos. DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation-to capture comprehensive scene evolution. To simplify learning complex dynamics, we propose a decoupled latent world modeling strategy that separates world representation learning from future state decoding, augmented by dynamic-aware ray sampling to enhance motion modeling. For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates spatiotemporal features from DriveX's predictions to enhance task-specific inference. Extensive experiments demonstrate DriveX's effectiveness: it achieves significant improvements in 3D future point cloud prediction over prior work, while attaining state-of-the-art results on diverse tasks including occupancy prediction, flow estimation, and end-to-end driving. These results validate DriveX's capability as a general-purpose world model, paving the way for robust and unified autonomous driving frameworks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shi Chen

Shaoshuai Shi

Kehua Sheng

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study