What question did this study set out to answer?

The aim is to improve monocular height estimation by incorporating geometric structural information from remote sensing images.

June 4, 2026Open Access

GA-DPNet:A Geometry-Aware Dense Prediction Network for Monocular Height Estimation from Remote Sensing Images

Key Points

The aim is to improve monocular height estimation by incorporating geometric structural information from remote sensing images.
Developed the Geometry-Aware Dense Prediction Network (GA-DPNet) using DINOv2 as the encoder network.
Implemented the Multi-Scale Geometric Alignment module to enhance feature consistency across levels.
Created the Multi-component Geometric Constraint Loss function to boost geometric plausibility.
Achieved mean absolute errors (MAEs) of 0.783, 1.022, and 0.835 on the Vaihingen, Potsdam, and DFC2019 datasets respectively.
Demonstrated improved accuracy and edge preservation in height estimation compared to existing models.

Abstract

Monocular height estimation from remote sensing images plays a crucial role in urban planning, 3D reconstruction, and environmental monitoring. However, existing monocular height estimation networks primarily rely on implicit semantic features in remote sensing images, while neglecting the geometric structural information of ground objects. This limitation reduces the edge-preserving capability and physical plausibility of ground objects in complex scenes. To address this issue, we propose a Geometry-Aware Dense Prediction Network for Monocular Height Estimation from Remote Sensing Images (GA-DPNet). Firstly, building upon the introduction of DINOv2 as the encoder network, the Multi-Scale Geometric Alignment (MSGA) module is designed to reduce the inconsistency in global geometric space among features extracted at different levels by DINOv2. Secondly, the Geometry-Aware Feature Fusion Block (GAFFB) is designed, which includes Geometric Feature Extractor (GFE), Geometry-Aware Attention module (GAA), and Geometric Modulated Residual module (GMR). By extracting four geometric features including gradient, curvature, planarity, and edges to modulate attention weights, GAFFB improves the effectiveness of fusing multi-scale geometric information with semantic features during the decoding process. Finally, the Multi-component Geometric Constraint Loss (MGCL) function is designed, including geometric consistency loss and physical constraint loss, to enhance the geometric plausibility of the network’s predictions. Experimental results on three public datasets, Vaihingen, Potsdam, and DFC2019, show that GA-DPNet achieves MAEs of 0.783, 1.022, and 0.835, respectively, demonstrating superior performance in terms of height estimation accuracy and edge preservation.

Bookmark

View Full Paper

Cite This Study

Yang et al. (Mon,) studied this question.

synapsesocial.com/papers/6a211591d499ed480b16ea05 https://doi.org/https://doi.org/10.3724/2096-7004.di.2025.0398

Bookmark

View Full Paper