What question did this study set out to answer?

The goal is to enhance failure prediction through effective missing data imputation and optimized machine learning methods.

April 4, 2026Open Access

Advanced imputation techniques in failure prediction with optimized machine learning models using genetic algorithms

Key Points

The goal is to enhance failure prediction through effective missing data imputation and optimized machine learning methods.
Analyzed types of missing data and various imputation techniques.
Compared six imputation methods using datasets from two Polish companies.
Optimized machine learning models including MLP, XGBoost, stacking, and AdaBoost using genetic algorithms.
XGBoost-GA with KNN imputation showed the best performance on the second dataset.
Achieved accuracy of 98.77%, type I error of 0.16%, recall of 99.84%, and F1 score of 98.77%.
Advanced imputation techniques effectively treated missing data with the hybrid model.

Abstract

This study highlights the impact of missing data imputation techniques in failure prediction. Existing studies have focused less on the issue of missing data, examined less the overall performance of the imputation techniques, and often concentrated on the accuracy of the machine learning (ML) classifiers. To address these issues, this study provides full details of missing data, including types of missing values and the advantages and limitations of different imputation techniques. Furthermore, this study compares six imputation techniques, which include mean imputation, multiple imputation, hot deck imputation, multivariate regression imputation, KNN imputation, and MissForest imputation, using two Polish companies’ datasets. The first dataset comprises 5 910 companies, whereas the second contains 10 503 companies. This study uses a multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), stacking, and AdaBoost, all optimized with genetic algorithms (GA) that can effectively enhance the performance of the models. Our objective is to take advantage of the power of ML methods and a metaheuristic approach. The results indicate that the XGBoost-GA integrated with KNN imputation had the best performance in the second dataset, with an accuracy, type I error, recall, and F1 score of 98.77%, 0.16%, 99.84%, and 98.77%, respectively. Moreover, advanced imputation techniques can successfully treat missing data with the hybrid XGBoost-GA model. This study highlights the power of using advanced techniques to impute missing values and powerful ML models to predict failure. The contributions of this study are to use real-world datasets, impute missing data, apply hybrid models, and address other issues related to data preprocessing.

Bookmark

View Full Paper

Bookmark

View Full Paper

Advanced imputation techniques in failure prediction with optimized machine learning models using genetic algorithms

Key Points

Abstract

Cite This Study