Monocular dynamic scene reconstruction is a challenging task due to the inherent limitation of observing the scene from a single viewpoint at each timestamp, particularly in the presence of object motion and illumination changes. Recent methods combine Gaussian Splatting with deformation modeling to enable fast training and rendering; however, their performance in real-world scenarios strongly depends on accurate point cloud initialization. When such initialization is unavailable and random point clouds are used instead, reconstruction quality degrades significantly. To address this limitation, we propose an optimization strategy that relaxes the requirement for accurate initialization in Gaussian-Splatting-based monocular dynamic scene reconstruction. The scene is first reconstructed under a static assumption using all monocular frames, allowing stable convergence of background regions. Based on reconstruction errors, a subset of Gaussians is then activated as dynamic to model motion and deformation. In addition, an annealing jitter regularization term is introduced to improve robustness to camera pose inaccuracies commonly observed in real-world datasets. Extensive experiments on established benchmarks demonstrate that the proposed method enables stable training from randomly initialized point clouds and achieves reconstruction performance comparable to approaches relying on accurate point cloud initialization.
Wang et al. (Wed,) studied this question.