Abstract Machine learning (ML) regression models are developed to predict the energy bandgap of materials using only their chemical compositions and crystal structures. The study evaluates eight stand‐alone ML regression models and eleven ensemble models, four based on stacking and seven on bagging, incorporating Ridge regression with cross‐validation (RidgeCV) and Least Absolute Shrinkage and Selection Operator with cross‐validation (LassoCV). All models are trained on the matbench‐mp‐gap dataset from Matminer, which includes 106,113 compounds with energy band gaps computed using the generalized gradient approximation with the Perdew Burke Ernzerhof functional (GGA‐PBE). Feature engineering is performed using Matminer and Pymatgen, while feature importance and correlation are assessed through permutation and Pearson methods. Among the stand‐alone models, the Random Forest (RF) model achieves the highest accuracy with a maximum coefficient of determination (R2) of 0.943 and a root mean square error (RMSE) of 0.504 eV. The best ensemble model, based on bagging, reaches R2 of 0.948 and RMSE of 0.479 eV. The model is applied to predict band gaps of new half‐Heusler (HH) compounds with 18 valence electrons. These predictions are validated through density functional theory (DFT) calculations without much computational effort, accelerating the discovery of potential optoelectronic materials from wide chemical space.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mukesh K. Choudhary
Aditya Raj
Gowri Sankar S
Advanced Theory and Simulations
Central University of Tamil Nadu
Building similarity graph...
Analyzing shared references across papers
Loading...
Choudhary et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68c1ae7754b1d3bfb60e6767 — DOI: https://doi.org/10.1002/adts.202500771