Autonomous Underwater Vehicles (AUVs) play a critical role in ocean exploration. However, due to the inherent limitations of most sensors in underwater environments, achieving accurate navigation and localization in complex underwater scenarios remains a significant challenge. While vision-based Simultaneous Localization and Mapping (SLAM) provides a cost-effective alternative for AUV navigation, existing methods are primarily designed for terrestrial applications and struggle to address underwater-specific issues, such as poor illumination, dynamic interference, and sparse features. To tackle these challenges, we propose RAEM-SLAM, a robust adaptive end-to-end monocular SLAM framework for AUVs in underwater environments. Specifically, we propose a Physics-guided Underwater Adaptive Augmentation (PUAA) method that dynamically converts terrestrial scene datasets into physically realistic pseudo-underwater images for the augmentation training of RAEM-SLAM, improving the system’s generalization and adaptability in complex underwater scenes. We also introduce a Residual Semantic–Spatial Attention Module (RSSA), which utilizes a dual-branch attention mechanism to effectively fuse semantic and spatial information. This design enables adaptive enhancement of key feature regions and suppression of noise interference, resulting in more discriminative feature representations. Furthermore, we incorporate a Local–Global Perception Block (LGP), which integrates multi-scale local details with global contextual dependencies to significantly improve AUV pose estimation accuracy in dynamic underwater scenes. Experimental results on real-world underwater datasets demonstrate that RAEM-SLAM outperforms state-of-the-art SLAM approaches in enabling precise and robust navigation for AUVs.
Wu et al. (Fri,) studied this question.