This study highlights the impact of missing data imputation techniques in failure prediction. Existing studies have focused less on the issue of missing data, examined less the overall performance of the imputation techniques, and often concentrated on the accuracy of the machine learning (ML) classifiers. To address these issues, this study provides full details of missing data, including types of missing values and the advantages and limitations of different imputation techniques. Furthermore, this study compares six imputation techniques, which include mean imputation, multiple imputation, hot deck imputation, multivariate regression imputation, KNN imputation, and MissForest imputation, using two Polish companies’ datasets. The first dataset comprises 5 910 companies, whereas the second contains 10 503 companies. This study uses a multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), stacking, and AdaBoost, all optimized with genetic algorithms (GA) that can effectively enhance the performance of the models. Our objective is to take advantage of the power of ML methods and a metaheuristic approach. The results indicate that the XGBoost-GA integrated with KNN imputation had the best performance in the second dataset, with an accuracy, type I error, recall, and F1 score of 98.77%, 0.16%, 99.84%, and 98.77%, respectively. Moreover, advanced imputation techniques can successfully treat missing data with the hybrid XGBoost-GA model. This study highlights the power of using advanced techniques to impute missing values and powerful ML models to predict failure. The contributions of this study are to use real-world datasets, impute missing data, apply hybrid models, and address other issues related to data preprocessing.
Madou et al. (Thu,) studied this question.