Monocular depth estimation (MDE) is a fundamental problem in computer vision with broad applications in various downstream tasks. While recent studies focus on designing increasingly complex and powerful deep learning methods to regress depth maps directly, we propose a novel approach by introducing the virtual point cloud (VPC) as an intermediate representation to provide the approximate geometric prior for the MDE task. In this paper, we design a multi-scale multi-space representation fusion-enhanced monocular depth estimation framework to address the challenges of MDE. Specifically, to resolve the issue of scale ambiguity, we design a VPC feature extraction module to learn multi-scale 3D geometric information for the depth prior. Then, we explicitly introduce geometric constraints for global depth prediction by incorporating a multi-space representation fusion from both the texture features in 2D space and the geometric features in 3D space. To mitigate errors at object boundaries, we introduce a confidence map generated based on the quality of the VPC to refine the predicted depth map. Specifically, we construct convolution receptive fields based on 3D spatial distances in spherical coordinates, ensuring that the confidence map provides reliable geometric guidance at object boundaries. Furthermore, we propose an independent confidence geometric consistency loss to supervise the refinement process. Experimental results demonstrate that our method significantly outperforms state-of-the-art approaches across all evaluation metrics on the KITTI and NYU-Depth-v2 datasets, achieving RMSE improvements of 9.2% and 2.8%, respectively. Moreover, zero-shot evaluations on the nuScenes and SUN-RGBD datasets further validate the generalizability of our approach.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lin Bie
Siqi Li
Xiaopin Zhong
ACM Transactions on Multimedia Computing Communications and Applications
Tsinghua University
Shenzhen University
Building similarity graph...
Analyzing shared references across papers
Loading...
Bie et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68de5d9383cbc991d0a20163 — DOI: https://doi.org/10.1145/3770076