What question did this study set out to answer?

The research aims to improve monocular 3D reconstruction in challenging environments like highway tunnels through a novel fusion framework.

March 13, 2026Open Access

Zero-Shot Polarization-Intensity Physical Fusion Monocular Depth Estimation for High Dynamic Range Scenes

Key Points

The research aims to improve monocular 3D reconstruction in challenging environments like highway tunnels through a novel fusion framework.
Proposes a physics-aware deep learning framework combining polarization sensing with intensity imaging.
Utilizes a Modality-Aligned Parameter Injection strategy to adapt the Vision Transformer for multi-modal inputs.
Implements a Reliability-Aware Gating mechanism to adjust weight based on intensity saturation and polarization signal validity.
Validated on the POLAR-GLV benchmark, a specialized dataset for high dynamic range tunnel scenarios.
Achieved a 24.2% reduction in geometric reconstruction error in high-glare tunnel exit zones.
Improved performance by 10.0% in tunnel entrances compared to intensity-only approaches.
Performance gains were achieved with negligible additional computational cost, enhancing feasibility for onboard systems.

Abstract

Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that overcomes these limitations by fusing polarization sensing with conventional intensity imaging. Unlike traditional end-to-end data-driven fusion strategies, we propose a Modality-Aligned Parameter Injectionstrategy. By remapping the weight space of the input layer, this strategy achieves a smooth transfer of the pre-trained Vision Transformer (i.e., MiDaS) to multi-modal inputs. Its core advantage lies in the seamless integration of four-channel polarization geometric information while fully preserving the pre-trained semantic representation capabilities of the backbone network, thereby avoiding the overfitting risk associated with training from scratch on small-sample data. Furthermore, we design a Reliability-Aware Gating mechanism that dynamically re-weights appearance and geometric cues based on intensity saturation and the physical validity of polarization signals as measured by the Degree of Linear Polarization (DoLP). We validate the proposed method on our self-constructed POLAR-GLV benchmark, a real-world dataset collected specifically for high dynamic range tunnel scenarios. Extensive experiments demonstrate that our method consistently outperforms intensity-only baselines, reducing geometric reconstruction error by 24.2% in high-glare tunnel exit zones and 10.0% at tunnel entrances. Crucially, compared to multi-stream fusion architectures, these performance gains come with negligible additional computational cost, making the framework highly suitable for resource-constrained onboard inference environments.

Zero-Shot Polarization-Intensity Physical Fusion Monocular Depth Estimation for High Dynamic Range Scenes

Key Points

Abstract

Cite This Study