With the continuous increase in the cost of data annotation and the explosion of diverse demands, traditional supervised learning is confronted with two major problems: the difficulty in annotating a large number of labels and the expansion bottleneck. Contrastive Self-Supervised Learning (CSSL) provides an effective solution for deep feature extraction in a label-free environment by constructing sample pairs and maximizing their discrimination in the feature space. This review takes Contrastive Predictive Coding (CPC), Simple Framework for Contrastive Learning of Visual Representations (SimCLR), Momentum Contrast (MoCo), Bootstrap Your Own Latent (BYOL), Supervised Contrastive Learning (SupCon), Swapping Assignments between Views (SwAV), and Self-Distillation with No Labels (DINO) as the research objects, systematically sorting out their theoretical frameworks and architectural improvements. The research methods cover InfoNCE loss, momentum encoders, and negative-free self-distillation techniques. This paper focuses on comparing the Top‑1 accuracy and computational resource costs of each method in the ImageNet linear evaluation, and presents the core technical differences and performance advantages and disadvantages through three tables. The comparison results show that SupCon and DINO approach or exceed traditional supervised pre-training in different batch settings, while lightweight methods such as BYOL and SwAV perform particularly well in resource‑constrained scenarios. Therefore, it can be seen that CSSL has not only made significant progress in feature representation quality and generalization ability, but also laid a solid foundation for subsequent research on the interpretability, multimodal fusion, and efficient deployment of deep learning models.
Zhenghan Li (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: