Monocular depth estimation is an essential component in computer vision that enables 3D scene understanding, with critical applications in autonomous driving and augmented reality. This paper proposes a lightweight self-supervised framework from single RGB images for multi-scale feature extraction and artifact elimination in monocular depth estimation (MsGf). The proposed framework first designs a Cross-Dimensional Multi-scale Feature Extraction (CDMs) module. The CDMs module combines parallel multi-scale convolution with sequential feature convolutions to achieve multi-scale feature extraction with minimal parameters. Additionally, a Sobel Edge Perception-Guided Filtering (SEGF) module is proposed. The SEGF module uses the Sobel operator to decompose the features into horizontal direction features and vertical direction features, and then generates the filter kernel through two steps of filtering to effectively suppress artifacts and better capture structural and edge features. A large number of ablation experiments and comparative experiments on the KITTI and Make3D datasets demonstrate that the MsGf with only 0.8 M parameters can achieve better performance than the current most advanced methods.
Tian et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: