Key points are not available for this paper at this time.
This study analyzes the impact of different data imputation methods on the performance of machine learning models, focusing on the challenges associated with missing data in real-world datasets. The handling of missing data is crucial to ensure the accuracy and reliability of predictive models. This research aims to assess how different imputation techniques affect model outcomes, providing insights into the selection of appropriate methods for different scenarios and types of datasets. The study begins with an overview of related work on machine learning and data imputation methods, focusing on the evaluation of their performance, limitations, and applicability across various contexts. We then describe our research methodology, including data collection, pre-processing steps, and the systematic application of multiple imputation techniques tailored to different data structures. Our analysis focuses on evaluating the impact of these imputation methods on the performance of a selected set of machine learning models across multiple metrics. The results are interpreted using key evaluation metrics, revealing significant differences in model accuracy and performance depending on the imputation strategy employed. The study highlights the critical role of appropriate data imputation in maintaining and improving model performance and provides practical recommendations for its application in various machine learning workflows. Finally, we discuss the broader implications of our findings, suggesting best practices for data imputation in machine learning and offering guidance to practitioners dealing with incomplete datasets. This research contributes to a deeper understanding of how missing data handling can influence predictive model outcomes and model robustness.
Chebli et al. (Wed,) studied this question.