Current RGB-D Camouflaged Object Detection (COD) methods primarily rely on dense pixel-level annotations, which suffer from the limitation of high labeling costs. In this paper, we investigate a weakly supervised RGB-D COD using scribble annotations to reduce annotation costs. First, we design a Multimodal SAM-based Label Optimization (MSLO) strategy. Through dual pixel-level and image-level optimization, this strategy refines the initial results generated by the Segment Anything Model (SAM), thereby producing high-quality pseudo-labels. Second, we propose a Spatial Frequency Exploration Module (SFEM), which enhances feature representation by mining important features from both spatial and frequency domains. Furthermore, we construct a Multi-Modal Cross-layer Fusion Module (MCFM), which aims to achieve effective fusion of multi-modal features and fully capture multi-scale contextual information. Extensive experiments demonstrate that our method outperforms most fully supervised RGB/RGB-D COD methods and surpasses state-of-the-art weakly supervised RGB COD methods.
Song et al. (Sun,) studied this question.