To identify lead-free perovskite compounds with high compositional flexibility, we developed a band gap prediction model using machine learning (ML). We analyzed CGCNN input features and evaluated factors contributing to band gap prediction, revealing that B site atomic features were dominant. Based on the analysis results, we designed a highly interpretable feature set that derived from compositional information and applied it to the model for band gap prediction. Furthermore, we trained feature-generation ML models to predict structural features, such as cell volume and B-X bond distance, from compositional information and added these features to a support vector regression (SVR) model. We confirmed that incorporating ML-generated structural features improved the accuracy of band gap prediction.
Kobayashi-Kajikawa et al. (Thu,) studied this question.