Abstract Weakly Supervised Semantic Segmentation (WSSS) techniques frequently rely on pseudo-labels to train segmentation models in the absence of fully annotated data, thereby reducing annotation costs. However, their performance is highly sensitive to the quality and uncertainty of the pseudo-labels employed. In this extended study, we further investigate the effectiveness of integrating cross-supervision and contrastive learning over pixel-level pseudo-annotations in weakly supervised settings where only image-level labels are available. We revisit CSRM, a weakly supervised segmentation framework based on a multi-branch deep convolutional network. CSRM exploits reliable pseudo-labels to mutually enhance classification and segmentation tasks, while incorporating both reliable and unreliable pseudo-labels into a contrastive representation learning scheme. In addition to standard benchmarks, this extended version evaluates CSRM on the HPA Single-Cell Classification dataset, a genuinely weakly supervised instance segmentation benchmark for protein localization in single cells. Empirical results demonstrate that CSRM achieves competitive performance on Pascal VOC 2012 (75.0% mIoU), MS COCO 2014 (50.4% mIoU), and yields substantial improvements over the baseline on the HPA dataset.
David et al. (Wed,) studied this question.