What question did this study set out to answer?

Explore the integration of geometric depth information to enhance multi-view stereo reconstruction accuracy.

March 24, 2026

Towards Neural Multi View 3D Reconstruction from RGB-D Data

Key Points

Explore the integration of geometric depth information to enhance multi-view stereo reconstruction accuracy.
Developed RGB-D MVSNet architecture combining depth-fusion encoder and MVS backbone.
Constructed unified feature volume from photometric and geometric features.
Evaluated on Sk3D dataset with synchronized RGB, ToF depth, and structured-light scans.
Improved accuracy and completeness metrics compared to RGB-only baseline.
Achieved qualitative improvements in reconstructing textureless and glossy regions.
Reduced artifacts with high-quality depth inputs.

Abstract

Multi-view stereo (MVS) networks have recently achieved remarkable progress in dense 3D reconstruction, yet they remain fundamentally limited by reliance on photometric cues. As a result, current methods fail in textureless, reflective, or non-Lambertian regions. At the same time, commodity time-of-flight (ToF) sensors provide geometric depth information that is complementary but low-resolution and noisy. In this work study a possibility to use 3D features extracted from depth data to overcome MVS limitations. For this we develop RGB-D MVSNet, an end-to-end architecture that integrates a depth-fusion encoder with a modern learning-based MVS backbone. Our method constructs a unified feature volume from both photometric and geometric features, which is then fused and regularized in a with common decoder. We evaluate the approach on the challenging Sk3D dataset containing synchronized RGB, ToF depth, and high-quality structured-light scans. Experiments demonstrate that our method improves accuracy and completeness metrics over the RGB-only baseline and achieves some qualitative improvements in reconstructing textureless and glossy regions. Additional experiments with high-quality depth input show that the method is capable of eliminating typical artifacts with better input depth quality. These results indicate that integrating geometric cues into MVS pipelines is a promising direction towards more robust, generalizable 3D reconstruction.

اسأل الذكاء الاصطناعي

Bookmark

اسأل الذكاء الاصطناعي

Bookmark

Towards Neural Multi View 3D Reconstruction from RGB-D Data

Key Points

Abstract

Cite This Study