ABSTRACT Compared to monocular depth estimation, multi‐view depth estimation often yields more accurate results. However, traditional multi‐view depth estimation methods often fail to leverage semantic information fully and struggle to effectively fuse information from multiple views, leading to suboptimal prediction performance in challenging scenarios such as texture‐less regions and reflective surfaces. To address these limitations, we present MVI‐Depth, a novel framework with two core innovations: (1) a Semantic Fusion Module (SFM) that establishes semantic correspondence, and (2) a Depth Updating Module (DUM) enabling iterative depth refinement. Specifically, MVI‐Depth initially establishes a main view representation that integrates single‐view depth, depth features, and semantic features. Subsequent feature extraction from neighbouring views enables the construction of the original cost volume. Recognising the inherent limitations of direct cost volume utilisation in complex scenes, the proposed SFM constructs an aligned semantic cost volume to utilise the complementarity between semantic and depth information, forming an improved final cost volume. The final cost volume is updated through the proposed DUM to achieve iterative depth optimisation. Comprehensive evaluations demonstrate that MVI‐Depth achieves superior performance across all standard metrics on both ScanNet and KITTI benchmarks, outperforming existing methods. Additional experiments on the 7‐Scenes dataset further confirm the framework's robust generalisation capabilities in diverse environments.
Zhu et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: