Stereo matching of remote sensing images is a crucial step for generating digital surface models in the fields of photogrammetry and remote sensing. Currently, deep learning–based stereo matching methods demonstrate significant advantages in both matching accuracy and efficiency compared to traditional methods. However, high-accuracy matching faces considerable challenges due to temporal inconsistencies in remote sensing stereo image pairs, along with severe occlusions caused by high-rise buildings and large textureless areas commonly found in urban environments. To address these challenges and improve disparity estimation accuracy, this paper proposes an end-to-end dynamic frequency iterative stereo matching network that integrates multi-frequency information and a self-attention mechanism for disparity estimation from high-resolution satellite stereo images. We designed a high- and low-frequency iterative optimization module to capture low-frequency information from large textureless regions and high-frequency information from object edges. This module helps preserve disparity consistency in textureless areas and disparity discontinuity at object edges, thereby maintaining more complete structural details. Furthermore, a self-attention layer is introduced to propagate relevant information from surrounding areas to occluded regions via the self-attention mechanism, mitigating severe occlusion problems caused by tall buildings. The proposed network was evaluated on the US3D and WHU-Stereo remote sensing data sets. Experimental results demonstrate that our network achieves higher accuracy compared to existing state-of-the-art methods. It performs well in textureless and occluded areas, effectively reduces edge blurring, and recovers finer object details.
Zhu et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: