Abstract In recent years, machine learning has played a crucial role in data-driven material development. This study presents a feature extraction method for enhancing the predictive accuracy of regression models. The proposed method is examined using SHapley Additive exPlanations (SHAP) values, which are commonly used for interpreting black-box models, to determine whether it can transfer the high expressiveness of an accurate regression model to underfitting regression models. It was revealed that SHAP values can capture valuable information for regression analysis, resulting in improved predictive accuracy. The results also underscore the importance of base model selection to extract SHAP values, whose effectiveness is significantly influenced by the base model. Random forest demonstrated superior performance for SHAP-based feature extraction, presumably because of its ability to capture complex non-linear relationships, regardless of the specific SHAP explainer used. In addition, the proposed method can improve material exploration efficiency during Bayesian optimization. Graphical abstract
Takuya Ehiro (Tue,) studied this question.