Abstract Sonar imaging is essential for underwater perception, yet its quality is often degraded by strong multiplicative speckle noise. Conventional supervised despeckling methods rely on clean reference images, which are typically unavailable in practical sonar scenarios. Although self-supervised blind-spot networks (BSNs) remove the need for paired data, their performance on sonar imagery remains limited, mainly due to two factors: first, the strong spatial correlation of speckle noise leads to implicit noise leakage; second, blind-spot masking removes the center pixel, resulting in irreversible loss of local structural details, especially in single-channel sonar images where inter-channel redundancy is absent. To address these issues, we propose SAME, a self-supervised semantic-guided sonar image despeckling framework with two complementary modules, where the Multi-scale Mixture-of-Experts Gated (MOEG) module employs dynamic expert routing with heterogeneous receptive fields to decouple spatially correlated noise, while the Contextual Semantic Enhancement Module (CSEM) introduces structural priors from a frozen self-supervised DINO backbone to compensate for structural degradation caused by blind-spot masking. Extensive experiments on the DEBRIS and KLSG datasets show that SAME achieves superior speckle suppression and improved structural fidelity compared with existing methods, demonstrating its effectiveness without requiring clean ground truth.
Guo et al. (Mon,) studied this question.