Abstract The prediction of scene depth from monocular images has become crucial in fields such as spatial perception and computer vision. However, unsupervised depth estimation methods based on view synthesis often ignore the significant impact of non-Lambertian surfaces and ghosting artifacts. In this study, we propose a self-learning depth reconstruction framework. This framework introduces a depth consistency loss to compensate for the failure of the photometric assumption in non-Lambertian regions. Additionally, we design an intrinsic consistency loss that leverages variance as a game-theoretic strategy to ensure the robustness of our model. Finally, we introduce a physics-inspired ghosting mask to eliminate ghosting artifacts. We also design a Multi-Path Transformer layer that integrates the Transformer's global dependency modeling capability into CNNs, thereby enhancing the model's performance. Experimental results show that our model demonstrates excellent performance in non-Lambertian regions. Compared with state-of-the-art methods that merely rely on the photometric assumption, our method achieves average improvements of 9.29% and 2.86% on the Sq Rel and RMSE metrics across three network models. Furthermore, it exhibits outstanding zero-shot generalization capability on external datasets. The source code is available at https://github.com/IkeFwd/Icdepth.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ke Li
Bolin Song
Naiyao Wang
Machine Learning Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68d44c3431b076d99fa5520d — DOI: https://doi.org/10.1088/2632-2153/ae054b
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: