What question did this study set out to answer?

This research aims to improve crystal synthesizability prediction using Positive-Unlabeled learning strategies and tree-based models.

April 22, 2026Open Access

Investigating Tree‐Based Models Across Positive–Unlabeled Learning Frameworks for Crystal Synthesizability Prediction

Key Points

This research aims to improve crystal synthesizability prediction using Positive-Unlabeled learning strategies and tree-based models.
Formulated crystal synthesizability prediction as a Positive-Unlabeled learning problem.
Benchmarked nine PU learning strategies with four tree-based models on 130,000 crystal structures.
Evaluated model performance using discovery-oriented metrics like Precision@200.
PU learning strategies outperformed traditional classification approaches.
Thermodynamic stability and formation energy were key decision factors in model outcomes.
The study enables more reliable identification of experimentally synthesizable materials.

Abstract

ABSTRACT The computational prediction of crystal synthesizability remains a major challenge in data‐driven materials discovery, as most entries in large materials databases correspond to theoretical structures without experimental validation. This asymmetry creates a learning scenario in which only experimentally synthesized crystals are reliably labeled, while most structures remain unlabeled and ambiguous. In this work, crystal synthesizability prediction is formulated within a Positive–Unlabeled (PU) learning framework rather than as a conventional binary classification problem. We benchmark nine PU learning strategies combined with four tree‐based machine learning models, including Random Forest, LightGBM, XGBoost, and CatBoost, using approximately 130 000 crystal structures from the Materials Project database. After physically consistent data preprocessing and feature selection, model performance is evaluated using discovery‐oriented metrics, with Precision@200 as the primary criterion. The results show that PU formulations outperform naïve classification approaches in early‐stage discovery, enabling more reliable prioritization of experimentally synthesizable materials. Model explanation analysis reveals that thermodynamic stability, formation energy, structural compactness, and magnetic descriptors dominate the learned decision mechanisms. Overall, this study provides a comparative evaluation of PU learning strategies and demonstrates their effectiveness as a discovery‐oriented framework for identifying experimentally viable crystal structures.

Investigating Tree‐Based Models Across Positive–Unlabeled Learning Frameworks for Crystal Synthesizability Prediction

Key Points

Abstract

Cite This Study