Monocular 3D object detection models have become increasingly popular due to its low cost and ease of deployment. It remains challenging because of limited depth estimation and dataset imbalance. To tackle this challenge, we propose a Spatiotemporally Consistent Pseudo-labels Module (SCPM) that aims to enhance the performance of monocular 3D object detection models. Our proposed method leverages spatiotemporal priors and data augmentation to generate reliable and temporally consistent pseudo-labels, effectively mitigating survivorship bias. In addition, we introduce a depth decoupling module guided by geometric priors to improve depth estimation and spatial localization, particularly under occlusion. The pro-posed framework enhances detection robustness and reduces missed detections. Extensive experiments conducted on the KITTI dataset demonstrate that our method significantly outperforms existing monocular 3D detection approaches, achieving state-of-the-art performance.
Wang et al. (Thu,) studied this question.