Abstract Reconstructing a 4D spatio‐temporal representation of a dynamic scene from monocular video is a fundamental yet highly challenging problem in computer vision and computer graphics. Recent advances in 3D Gaussian Splatting (3DGS) for static scenes have significantly improved rendering efficiency and visual fidelity. However, extending 3DGS to dynamic scenes from single‐view input remains difficult, as the lack of dynamic point cloud supervision often hinders the accurate modelling of moving objects, leading to suboptimal performance. In this paper, we introduce SD‐4DGS, a novel 4D Gaussian Splatting (4DGS) method with spatial densification, specifically designed for dynamic scene reconstruction from monocular video. SD‐4DGS features a fast spatial densification strategy that converts sparse point clouds into dense representations to better capture high‐frequency geometry and textures. In addition, a sliding window motion regularizer, together with a stage‐wise training schedule, progressively refines appearance and motion. Experiments show that our method significantly outperforms existing methods in terms of modelling quality across various datasets.
Han et al. (Thu,) studied this question.