What type of study is this?

This is a Quantitative Study study.

October 2, 2025Open Access

Multi-space Representation Fusion Enhanced Monocular Depth Estimation via Virtual Point Cloud

Key Points

The proposed method enhances monocular depth estimation by integrating a virtual point cloud as a geometric prior.
Experimental results show a significant RMSE improvement of 9.2% on the KITTI dataset and 2.8% on NYU-Depth-v2.
A confidence map improves depth predictions by incorporating quality metrics derived from 3D spatial distances.
This method demonstrates strong generalizability across diverse datasets, including nuScenes and SUN-RGBD.

Abstract

Monocular depth estimation (MDE) is a fundamental problem in computer vision with broad applications in various downstream tasks. While recent studies focus on designing increasingly complex and powerful deep learning methods to regress depth maps directly, we propose a novel approach by introducing the virtual point cloud (VPC) as an intermediate representation to provide the approximate geometric prior for the MDE task. In this paper, we design a multi-scale multi-space representation fusion-enhanced monocular depth estimation framework to address the challenges of MDE. Specifically, to resolve the issue of scale ambiguity, we design a VPC feature extraction module to learn multi-scale 3D geometric information for the depth prior. Then, we explicitly introduce geometric constraints for global depth prediction by incorporating a multi-space representation fusion from both the texture features in 2D space and the geometric features in 3D space. To mitigate errors at object boundaries, we introduce a confidence map generated based on the quality of the VPC to refine the predicted depth map. Specifically, we construct convolution receptive fields based on 3D spatial distances in spherical coordinates, ensuring that the confidence map provides reliable geometric guidance at object boundaries. Furthermore, we propose an independent confidence geometric consistency loss to supervise the refinement process. Experimental results demonstrate that our method significantly outperforms state-of-the-art approaches across all evaluation metrics on the KITTI and NYU-Depth-v2 datasets, achieving RMSE improvements of 9.2% and 2.8%, respectively. Moreover, zero-shot evaluations on the nuScenes and SUN-RGBD datasets further validate the generalizability of our approach.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lin Bie

Siqi Li

Xiaopin Zhong

Journals

ACM Transactions on Multimedia Computing Communications and Applications

Actions

Institutions

Tsinghua University

Shenzhen University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-space Representation Fusion Enhanced Monocular Depth Estimation via Virtual Point Cloud

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study