Robust and precise localization and visualization systems are critical for the efficient execution of close-proximity missions by unmanned underwater vehicles. However, purely visual Simultaneous Localization and Mapping (SLAM) often suffers from instability and sparse mapping in complex underwater environments. To overcome this, the paper presents a novel multimodal fusion framework that integrates SLAM with 3D Gaussian splatting. Specifically, measurements from a stereo camera, a Doppler Velocity Log (DVL), and an Inertial Measurement Unit (IMU) are tightly-coupled to ensure operational stability in visually degraded regions. A joint DVL–IMU online calibration and initialization scheme is implemented to enhance cross-modal sensor fusion. Concurrently, we leverage DVL beam ranging to constrain the depth of point clouds, thereby improving map accuracy. Regarding scene reconstruction, a point cloud densification strategy based on priors is designed to eliminate structural voids. An underwater medium model is incorporated to mitigate light attenuation and scattering effects. Furthermore, a hybrid geometric and photometric loss function is introduced to jointly optimize camera poses and Gaussian attributes, ensuring high structural and color fidelity. Extensive evaluations on simulation, pool, and ocean datasets confirm that the proposed system achieves robust localization and high-precision reconstruction.
Ding et al. (Sat,) studied this question.