Abstract Galaxy morphology classification relies on precise structural feature characterization, yet expert classifications are scarce. While citizen science initiatives (e.g., Galaxy Zoo) provide millions of crowd-sourced labels, traditional supervised models lack cross-dataset generalizability and few-shot adaptability. To address these, we propose Contrastive Language-image Pretraining Model for Galaxy Morphology Classification (CLIP-GMC), a framework leveraging the contrastive vision-text model CLIP to unlock the utility of citizen science labels for robust classification. CLIP’s pretrained embedding space enables flexible, text-guided feature learning with superior transfer and zero-shot capabilities. CLIP-GMC involves two stages: (1) pretraining the model using synthetic image-text pairs generated from Sloan Digital Sky Survey data in Galaxy Zoo 2, translating hierarchical morphological decisions into descriptive text; (2) validating the model on Galaxy MNIST (4 class) and Galaxy10 the DECam Legacy Survey (DECaLS; 10 class) across varying training data ratios (100%–1%) to assess few-shot and cross-dataset transferability. Experimental results show its exceptional superiority in low-data regimes and competitive high-data performance. On Galaxy10 DECaLS, it achieves 0.8390 accuracy/0.8350 macro-F1 with 50% data (comparable to full-data ConvNeXt-B: 0.8431/0.8250) and 0.7203/0.6991 with 1% data, outperforming supervised baselines and Zoobot2. At 100% data, its 0.8431 accuracy matches ConvNeXt-B but is slightly lower than Zoobot2 (GZ-Evo: 0.8670). On Galaxy MNIST, 10% data yields 0.9234 accuracy (surpassing full-data supervised 0.9052) and reliable zero-shot transfer (0.6865/0.6474), with 100% data accuracy (0.9321) comparable to Zoobot2. These highlight CLIP-GMC’s exceptional data efficiency and generalizability, offering a robust solution for data-scarce galaxy morphology classification.
CAI et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: