Abstract Circulating rare cells (CRCs), including circulating tumor cells (CTCs) and circulating cancer-associated fibroblasts (cCAFs), serve as valuable liquid biopsy biomarkers, yet their detection remains challenging due to extreme rarity and morphological heterogeneity. Current identification methods predominantly rely on fluorescence-based imaging and manual, time-consuming assessments by trained experts, which limit high-throughput analysis, reproducibility, and clinical implementation. Moreover, the strong dependence on subjective visual judgment makes CRC calling highly operator-dependent, introducing substantial inter- and intra-observer variability and complicating assay standardization across centers. To overcome these constraints, we developed a self-supervised deep learning framework that enables robust and interpretable detection of CRCs using minimal labeled data and with reduced dependence on fluorescence signals. Our approach employs a two-stage training strategy in which a model is first pretrained on large-scale white blood cell (WBC) datasets using contrastive learning, allowing it to learn generalizable morphological representations from abundant, morphologically similar cells. In the next step, knowledge distillation is used to transfer this learned knowledge into a lightweight student model that is subsequently fine-tuned on limited CRC data. This distillation process significantly reduces model complexity while preserving detection accuracy, thereby enabling real-time inference that is suitable for clinical workflows. In our experiments using samples from 27 patients with early-stage breast cancer, conventional fluorescence-based analyses manually identified both CTCs and cCAFs. When applied to the same dataset, the proposed framework achieved CRC detection sensitivity and specificity exceeding 90% while operating with minimal computational burden and showed high concordance with manual expert assessment. Compared with conventional fluorescence-based manual annotation, our approach offers substantial gains in speed, consistency, and scalability, while eliminating inter-observer variability inherent to expert-dependent assessments. These results suggest that self-supervised representation learning combined with knowledge distillation provides a practical and clinically viable strategy for automated CRC detection, with potential applications in early cancer diagnosis, longitudinal disease monitoring, and treatment response assessment. Citation Format: Hyeongjung Woo, Seonghwan Park, Jungmin Lee, Inkyu Moon, Minseok S. Kim. A lightweight self-supervised deep learning framework for automated detection of circulating tumor cells and cancer-associated fibroblasts abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 119.
Woo et al. (Fri,) studied this question.