What question did this study set out to answer?

The aim is to enhance multi-frame depth estimation in complex indoor scenarios using monocular priors.

January 23, 2026Open Access

MonoPrior-Fusion: Monocular-Prior-Guided Multi-Frame Depth Estimation with Multi-Scale Geometric Fusion

Key Points

The aim is to enhance multi-frame depth estimation in complex indoor scenarios using monocular priors.
Integration of pixel-wise monocular priors into multi-view matching process.
Modulation of cost-volume hypotheses for better match disambiguation.
Hierarchical fusion architecture used to combine global and local geometric information.
Introduction of a geometric consistency loss based on virtual planes.
Significant improvements over existing multi-frame depth estimation techniques.
Demonstrated strong generalization across unseen domains.
Achieved more accurate and complete 3D reconstructions in volumetric fusion tasks.

Abstract

Precise 3D perception is critical for indoor robotics, augmented reality, and autonomous navigation. However, existing multi-frame depth estimation methods often suffer from significant performance degradation in challenging indoor scenarios characterized by weak textures, non-Lambertian surfaces, and complex layouts. To address these limitations, we propose MonoPrior-Fusion (MPF), a novel framework that integrates pixel-wise monocular priors directly into the multi-view matching process. Specifically, MPF modulates cost-volume hypotheses to disambiguate matches and employs a hierarchical fusion architecture across multiple scales to propagate global and local geometric information. Additionally, a geometric consistency loss based on virtual planes is introduced to enhance global 3D coherence. Extensive experiments on ScanNetV2, 7Scenes, TUM RGB-D, and GMU Kitchens demonstrate that MPF achieves significant improvements over state-of-the-art multi-frame baselines and generalizes well across unseen domains. Furthermore, MPF yields more accurate and complete 3D reconstructions when integrated into a volumetric fusion pipeline, proving its effectiveness for dense mapping tasks. The source code will be made publicly available to support reproducibility and future research.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Lin et al. (Wed,) studied this question.

synapsesocial.com/papers/69731022c8125b09b0d1fe0a https://doi.org/https://doi.org/10.3390/s26020712

Bookmark

View Full Paper