RGB-depth underwater salient object detection (USOD) poses considerable challenges, such as uneven lighting, visual interference, and image blur, which limit the effectiveness of traditional approaches. The segment anything model (SAM), known for its robust segmentation capabilities, offers a promising alternative. However, SAM depends on prompt labels (e.g., points, boxes, and masks) to perform effective resources typically unavailable in USOD datasets. To address this, we propose SAM-CLNet, a prompt-free, SAM-enhanced collaborative learning network comprising three main components: 1) SAM; 2) a mask prompt generator (MPG); and 3) a region-aware attention collaborative learning loss (RCL). In our framework, pseudo-mask prompts generated by MPG were used as input prompts for SAM, helping to offset performance degradation due to the absence of manual labels. Simultaneously, RCL leveraged high-quality SAM predictions to refine MPG, enhancing its feature extraction while minimizing the impact of low-quality pseudo-prompts on SAM. This cyclic feedback mechanism facilitated mutual improvement in detection accuracy. In addition, we introduced a U-Adapter module to adapt SAM for underwater imagery and incorporated a frequency cross-attention fusion module in MPG to integrate RGB and depth information. The region-aware attention in RCL further targeted challenging regions by comparing SAM's predictions with MPG's pseudo-mask. Experiments on the USOD10K and USOD datasets demonstrated that SAM-CLNet outperformed existing methods and generalized effectively across five public salient object detection (SOD) benchmarks. Code is available at https://github.com/BeibeiIsFreshman/SAM-USOD.
Zhou et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: