Stereo vision leverages binocular imagery to emulate the human visual system in perceiving three-dimensional (3D) structures by estimating disparity from rectified image pairs and converting it to depth via geometric triangulation. In recent years, deep learning-based stereo matching has significantly advanced in accuracy, efficiency, and generalization, surpassing traditional methods and demonstrating great potential in remote sensing applications. However, stereo matching in remote sensing faces unique challenges not commonly seen in terrestrial datasets. These include limited access to satellite imagery, seasonal differences between image pairs, difficulty in identifying small objects, and widespread regions with repetitive textures, such as lakes and forests. Unlike prior surveys that primarily address ground-level scenes, this paper presents a comprehensive review of stereo matching techniques tailored for remote sensing. It synthesizes the progress and limitations of representative models, analyzes the characteristics and domain-specific constraints of remote sensing stereo datasets, and outlines future research directions and application prospects in this field.
Li et al. (Sat,) studied this question.