What type of study is this?

This is a Experimental Study study.

October 16, 2025Open Access

MVI‐Depth: Multi‐View Indoor Depth Estimation Based on the Fusion of Semantic Information

Key Points

MVI-Depth significantly enhances depth estimation accuracy in textured and reflective scenes, outperforming traditional methods.
The framework's semantic fusion module effectively constructs an aligned semantic cost volume, leading to improved outcomes.
Iterative depth refinement through the depth updating module ensures continuous optimization of depth estimations.
Comprehensive evaluations across ScanNet and KITTI benchmarks confirm MVI-Depth's superior generalization capabilities.

Abstract

ABSTRACT Compared to monocular depth estimation, multi‐view depth estimation often yields more accurate results. However, traditional multi‐view depth estimation methods often fail to leverage semantic information fully and struggle to effectively fuse information from multiple views, leading to suboptimal prediction performance in challenging scenarios such as texture‐less regions and reflective surfaces. To address these limitations, we present MVI‐Depth, a novel framework with two core innovations: (1) a Semantic Fusion Module (SFM) that establishes semantic correspondence, and (2) a Depth Updating Module (DUM) enabling iterative depth refinement. Specifically, MVI‐Depth initially establishes a main view representation that integrates single‐view depth, depth features, and semantic features. Subsequent feature extraction from neighbouring views enables the construction of the original cost volume. Recognising the inherent limitations of direct cost volume utilisation in complex scenes, the proposed SFM constructs an aligned semantic cost volume to utilise the complementarity between semantic and depth information, forming an improved final cost volume. The final cost volume is updated through the proposed DUM to achieve iterative depth optimisation. Comprehensive evaluations demonstrate that MVI‐Depth achieves superior performance across all standard metrics on both ScanNet and KITTI benchmarks, outperforming existing methods. Additional experiments on the 7‐Scenes dataset further confirm the framework's robust generalisation capabilities in diverse environments.

MVI‐Depth: Multi‐View Indoor Depth Estimation Based on the Fusion of Semantic Information

Key Points

Abstract

Cite This Study

Also Consider

Also Consider