Foundation segmentation models exhibit strong generalization on natural images yet degrade substantially in underwater scenes due to color distortion, scattering, and low contrast, which collectively impair feature representation. Parameter-efficient fine-tuning strategies have been explored to adapt SAM to marine domains while preserving generalization, but degraded image quality still hampers feature extraction. Moreover, existing SAM-based underwater methods typically rely on ground-truth box prompts during inference. Since ground-truth boxes are inherently unavailable in real-world underwater scenarios, this dependence yields evaluation outcomes that fail to reflect actual deployment conditions, thereby limiting their practical applicability. To address these issues, Water-AutoSAM is introduced—a dual-domain enhanced auto-prompting framework tailored for underwater image segmentation. The auto-prompting mechanism decouples semantic and positional representations for generalized point generation, which are optimized via enhanced sharpness, correctness, and diversity losses under staged training. To counter the degrading effects typical of underwater imagery, a lightweight module designated SS-UIE is integrated as a frozen pre-enhancement stage. This module operates with spatial–frequency dual-branch processing and utilizes a fixed residual fusion coefficient to combine the two streams. Operating entirely without box prompts, Water-AutoSAM achieves competitive annotation-free performance, attaining 92.38% mIoU on SUIM and reducing the gap to the fully supervised upper bound to 2.08% on COD10K.
Sun et al. (Thu,) studied this question.