What question did this study set out to answer?

The aim is to develop a prompt-free collaborative learning network that improves underwater object detection accuracy without relying on manual labels.

March 21, 2026

Prompt Then Refine: Prompt-Free SAM-Enhanced Collaborative Learning Network for Detecting Salient Objects in Underwater Images

Key Points

The aim is to develop a prompt-free collaborative learning network that improves underwater object detection accuracy without relying on manual labels.
Proposed the SAM-CLNet framework combining the Segment Anything Model (SAM) and a mask prompt generator (MPG).
Implemented a region-aware attention collaborative learning loss (RCL) to refine feature extraction for pseudo-masks.
Adapted SAM for underwater imagery using a U-Adapter module and integrated RGB-depth information with frequency cross-attention.
SAM-CLNet outperformed existing methods in underwater salient object detection on the USOD10K and USOD datasets.
Demonstrated effective generalization across five public salient object detection benchmarks.

Abstract

RGB-depth underwater salient object detection (USOD) poses considerable challenges, such as uneven lighting, visual interference, and image blur, which limit the effectiveness of traditional approaches. The segment anything model (SAM), known for its robust segmentation capabilities, offers a promising alternative. However, SAM depends on prompt labels (e.g., points, boxes, and masks) to perform effective resources typically unavailable in USOD datasets. To address this, we propose SAM-CLNet, a prompt-free, SAM-enhanced collaborative learning network comprising three main components: 1) SAM; 2) a mask prompt generator (MPG); and 3) a region-aware attention collaborative learning loss (RCL). In our framework, pseudo-mask prompts generated by MPG were used as input prompts for SAM, helping to offset performance degradation due to the absence of manual labels. Simultaneously, RCL leveraged high-quality SAM predictions to refine MPG, enhancing its feature extraction while minimizing the impact of low-quality pseudo-prompts on SAM. This cyclic feedback mechanism facilitated mutual improvement in detection accuracy. In addition, we introduced a U-Adapter module to adapt SAM for underwater imagery and incorporated a frequency cross-attention fusion module in MPG to integrate RGB and depth information. The region-aware attention in RCL further targeted challenging regions by comparing SAM's predictions with MPG's pseudo-mask. Experiments on the USOD10K and USOD datasets demonstrated that SAM-CLNet outperformed existing methods and generalized effectively across five public salient object detection (SOD) benchmarks. Code is available at https://github.com/BeibeiIsFreshman/SAM-USOD.

Bookmark

Cite This Study

Zhou et al. (Thu,) studied this question.

synapsesocial.com/papers/69be35166e48c4981c673328 https://doi.org/https://doi.org/10.1109/tnnls.2026.3673090

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark