Multi-view stereo (MVS) networks have recently achieved remarkable progress in dense 3D reconstruction, yet they remain fundamentally limited by reliance on photometric cues. As a result, current methods fail in textureless, reflective, or non-Lambertian regions. At the same time, commodity time-of-flight (ToF) sensors provide geometric depth information that is complementary but low-resolution and noisy. In this work study a possibility to use 3D features extracted from depth data to overcome MVS limitations. For this we develop RGB-D MVSNet, an end-to-end architecture that integrates a depth-fusion encoder with a modern learning-based MVS backbone. Our method constructs a unified feature volume from both photometric and geometric features, which is then fused and regularized in a with common decoder. We evaluate the approach on the challenging Sk3D dataset containing synchronized RGB, ToF depth, and high-quality structured-light scans. Experiments demonstrate that our method improves accuracy and completeness metrics over the RGB-only baseline and achieves some qualitative improvements in reconstructing textureless and glossy regions. Additional experiments with high-quality depth input show that the method is capable of eliminating typical artifacts with better input depth quality. These results indicate that integrating geometric cues into MVS pipelines is a promising direction towards more robust, generalizable 3D reconstruction.
Building similarity graph...
Analyzing shared references across papers
Loading...
G. Bobrovskikh
Oleg Voynov
Evgeny Burnaev
Doklady Mathematics
Skolkovo Institute of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Bobrovskikh et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69c22975aeb5a845df0d3eef — DOI: https://doi.org/10.1134/s1064562425700619