Reconstructing 3D surfaces from electro-optical satellite imagery is an important capability for generating high-quality digital elevation models at scale. Recently, Gaussian splatting has emerged as a state-of-the-art technique for 3D reconstruction from satellite imagery. However, Gaussian splatting is optimized solely on RGB imagery, making it susceptible to errors when dealing with the radiometric inconsistencies and textureless regions common in satellite images. To address this, we propose a method for fusing Gaussian splatting with vision foundation models that is specifically tailored to satellite imagery. While recent work has explored fusing Gaussian splatting and vision foundation models, it has been studied only on terrestrial datasets, which, unlike multi-date satellite imagery, contain more constrained illumination at smaller scene scales. To account for these challenges, we introduce a method for computing multiscale satellite image embeddings along with a per-image feature alignment module. Benchmarked on the IARPA 2019 Challenge Dataset, our method reduces mean reconstruction error from 1.65 m to 1.57 m—a 5.2% relative improvement over previous methods. These results demonstrate that vision foundation models can enhance the geometric accuracy of satellite-based 3D reconstruction.
Reed et al. (Thu,) studied this question.