Abstract We present a fast, interpretable machine learning framework to classify early- and late-type galaxies in the COSMOS2025 catalog at 0 < z < 3, without relying on image-based training labels or computationally expensive structural fitting. Using the Santa Cruz semi-analytic model, we generate a training set with secure morphological labels defined by bulge-to-total mass ratio and specific star formation rate. We bridge the simulation-to-observation domain gap by injecting realistic photometric noise derived from COSMOS2025. A CatBoostClassifier trained on 66 broadband colors achieves excellent performance in the simulated domain, recovering late types with 98% precision/recall and early types with 91% precision and 88% recall. Applied to 44,132 COSMOS2025 galaxies, the model reveals a striking bimodality: only ∼6% of galaxies receive intermediate probabilities (0.3 < P (Early type) < 0.7)—nearly identical to the fraction observed in the simulation. This demonstrates that broadband colors are a decisive morphological discriminant, with the remaining 94% classified at high confidence. Validation against independent bulge+disk decompositions yields 70% overall accuracy, with late types identified at 78% purity and 74% completeness. The most important color feature, F277W–F444W, reflects the expected optical/near-infrared contrast between old and young stellar populations. The full pipeline completes in under 30 minutes on standard hardware, demonstrating that simulation-trained color-based classifiers offer a scalable, physically interpretable route to approximate morphology for large next-generation surveys.
Asadi et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: