ABSTRACT The computational prediction of crystal synthesizability remains a major challenge in data‐driven materials discovery, as most entries in large materials databases correspond to theoretical structures without experimental validation. This asymmetry creates a learning scenario in which only experimentally synthesized crystals are reliably labeled, while most structures remain unlabeled and ambiguous. In this work, crystal synthesizability prediction is formulated within a Positive–Unlabeled (PU) learning framework rather than as a conventional binary classification problem. We benchmark nine PU learning strategies combined with four tree‐based machine learning models, including Random Forest, LightGBM, XGBoost, and CatBoost, using approximately 130 000 crystal structures from the Materials Project database. After physically consistent data preprocessing and feature selection, model performance is evaluated using discovery‐oriented metrics, with Precision@200 as the primary criterion. The results show that PU formulations outperform naïve classification approaches in early‐stage discovery, enabling more reliable prioritization of experimentally synthesizable materials. Model explanation analysis reveals that thermodynamic stability, formation energy, structural compactness, and magnetic descriptors dominate the learned decision mechanisms. Overall, this study provides a comparative evaluation of PU learning strategies and demonstrates their effectiveness as a discovery‐oriented framework for identifying experimentally viable crystal structures.
Aydın et al. (Wed,) studied this question.