Abstract Underwater images suffer from color distortion, low contrast, and blurred details due to selective light absorption, scattering, and suspended particles in water, severely limiting the visual perception and autonomous operation capabilities of underwater robots. To address these issues, this paper proposes a multi‐scale underwater image enhancement network based on an improved VM‐Unet. Centered on the Visual Mamba model, this network employs an asymmetric encoder‐decoder architecture and incorporates parallel Visual Mamba layers to enhance long‐range dependency modeling. Additionally, it integrates an attention mechanism to construct a channel‐level multi‐scale feature fusion module, enabling dynamic integration of features across different scales and improving the model's adaptability and robustness in complex underwater environments. Experiments on the UIEB dataset demonstrate that the proposed method outperforms traditional approaches and mainstream deep learning models in both subjective visual quality and objective evaluation metrics (including PSNR, SSIM, UCIQE, and UIQM), particularly excelling in color restoration, detail preservation, and noise suppression. Furthermore, the method processes single‐frame images in just 0.2 s, offering excellent real‐time performance that meets the demands of real‐time visual enhancement for underwater robots. © 2026 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
Huang et al. (Thu,) studied this question.