Scalable simulation with real-world data is critical to the development of autonomous driving due to its convenience and practicality in training and testing algorithms. Thus, research on generating high-fidelity and consistent driving videos, especially those involving view transformations based on ego-vehicle action controls, has attracted growing interest. However, existing methods, like Neural Radiance Fields and 3D Gaussian Splatting, often lack generalization capability and require extensive inputs. Furthermore, 2D generative models can generate various views, yet still have potential in improving consistency and realism. To address these limitations, we propose EVGen, a novel video generative model that synthesizes front-view videos of vehicles conditioned on a set of planned trajectories. A new module that extracts contexts from neighboring pixels in both temporal and spatial domains is presented to improve the consistency of the synthesis. Additionally, we design an innovative attention module that integrates information both within individual frames and across a corresponding region of the reference frame. Extensive experiments demonstrate that our method outperforms several leading models in front-view driving video generation, and the proposed modules can enhance the model's performance. This work presents a new paradigm for goal-oriented video synthesis with minimal observation, enabling on-demand generation to accelerate algorithm development.
Building similarity graph...
Analyzing shared references across papers
Loading...
Beike Yu
Dafang Wang
Journal of King Saud University - Computer and Information Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...
Yu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69b4ba3618185d8a39802f6d — DOI: https://doi.org/10.1007/s44443-026-00627-4