What question did this study set out to answer?

The aim is to classify early- and late-type galaxies in the COSMOS2025 catalog using a machine learning framework without expensive image-based training.

April 26, 2026Open Access

COSMOS2025: Machine Learning Classification of Early- and Late-type Galaxies at 0 z < 3

Key Points

The aim is to classify early- and late-type galaxies in the COSMOS2025 catalog using a machine learning framework without expensive image-based training.
Developed a machine learning framework using a CatBoostClassifier.
Created a training set from the Santa Cruz semi-analytic model with morphological labels.
Injected realistic photometric noise to simulate observational data.
The classifier achieved 98% precision/recall for late types and 91% precision and 88% recall for early types.
Only ∼6% of galaxies received intermediate classification probabilities, indicating a bimodal distribution.
Validation against independent bulge+disk decompositions resulted in 70% overall accuracy.

Abstract

Abstract We present a fast, interpretable machine learning framework to classify early- and late-type galaxies in the COSMOS2025 catalog at 0 < z < 3, without relying on image-based training labels or computationally expensive structural fitting. Using the Santa Cruz semi-analytic model, we generate a training set with secure morphological labels defined by bulge-to-total mass ratio and specific star formation rate. We bridge the simulation-to-observation domain gap by injecting realistic photometric noise derived from COSMOS2025. A CatBoostClassifier trained on 66 broadband colors achieves excellent performance in the simulated domain, recovering late types with 98% precision/recall and early types with 91% precision and 88% recall. Applied to 44,132 COSMOS2025 galaxies, the model reveals a striking bimodality: only ∼6% of galaxies receive intermediate probabilities (0.3 < P (Early type) < 0.7)—nearly identical to the fraction observed in the simulation. This demonstrates that broadband colors are a decisive morphological discriminant, with the remaining 94% classified at high confidence. Validation against independent bulge+disk decompositions yields 70% overall accuracy, with late types identified at 78% purity and 74% completeness. The most important color feature, F277W–F444W, reflects the expected optical/near-infrared contrast between old and young stellar populations. The full pipeline completes in under 30 minutes on standard hardware, demonstrating that simulation-trained color-based classifiers offer a scalable, physically interpretable route to approximate morphology for large next-generation surveys.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Asadi et al. (Wed,) studied this question.

synapsesocial.com/papers/69edaafc4a46254e215b3354 https://doi.org/https://doi.org/10.3847/1538-4357/ae4eca

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper