August 15, 2025Open Access

RAEM-SLAM: A Robust Adaptive End-to-End Monocular SLAM Framework for AUVs in Underwater Environments

Key Points

RAEM-SLAM significantly enhances AUV navigation accuracy in dynamic underwater conditions and improves monitoring capabilities.
The framework utilizes a Physics-guided Underwater Adaptive Augmentation method, generating pseudo-underwater images for training.
Residual Semantic–Spatial Attention Module effectively fuses semantic and spatial information to reduce noise interference.
The integration of multi-scale local details with global context improves AUV pose estimation accuracy, demonstrating promising results.

Abstract

Autonomous Underwater Vehicles (AUVs) play a critical role in ocean exploration. However, due to the inherent limitations of most sensors in underwater environments, achieving accurate navigation and localization in complex underwater scenarios remains a significant challenge. While vision-based Simultaneous Localization and Mapping (SLAM) provides a cost-effective alternative for AUV navigation, existing methods are primarily designed for terrestrial applications and struggle to address underwater-specific issues, such as poor illumination, dynamic interference, and sparse features. To tackle these challenges, we propose RAEM-SLAM, a robust adaptive end-to-end monocular SLAM framework for AUVs in underwater environments. Specifically, we propose a Physics-guided Underwater Adaptive Augmentation (PUAA) method that dynamically converts terrestrial scene datasets into physically realistic pseudo-underwater images for the augmentation training of RAEM-SLAM, improving the system’s generalization and adaptability in complex underwater scenes. We also introduce a Residual Semantic–Spatial Attention Module (RSSA), which utilizes a dual-branch attention mechanism to effectively fuse semantic and spatial information. This design enables adaptive enhancement of key feature regions and suppression of noise interference, resulting in more discriminative feature representations. Furthermore, we incorporate a Local–Global Perception Block (LGP), which integrates multi-scale local details with global contextual dependencies to significantly improve AUV pose estimation accuracy in dynamic underwater scenes. Experimental results on real-world underwater datasets demonstrate that RAEM-SLAM outperforms state-of-the-art SLAM approaches in enabling precise and robust navigation for AUVs.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper