Building height is a fundamental parameter for characterizing urban three-dimensional structure and supporting applications such as urban planning, population estimation, and energy assessment. However, traditional shadow-based height inversion methods often suffer from occlusion, shadow overlap, and orientation inconsistencies when applied to heterogeneous urban environments. This study proposes a single-image building height estimation method that explicitly incorporates spatial distribution characteristics to enhance robustness and estimation accuracy. Shadow lengths are first robustly extracted using a fishnet–Pauta strategy, followed by a multi-scenario scaling coefficient model accommodating different sun–sensor geometric configurations. Urban areas are then subdivided into high-rise, mid-to-high-rise mixed, and dense low-rise zones using DBSCAN clustering and a composite indicator system. For each spatial type, tailored optimization strategies—including neighborhood-weighted correction, similarity-constrained local regression, and median smoothing—are applied to suppress systematic biases and local outliers. Experiments on 11,168 buildings across 13 Chinese cities demonstrate strong overall performance, achieving an MAE of 2.07 m, an RMSE of 2.56 m, and an R2 of 0.99. The proposed method outperforms existing approaches and remains highly stable across diverse urban morphologies, providing a scalable solution for large-area building height mapping from single high-resolution imagery.
Xie et al. (Thu,) studied this question.