Key points are not available for this paper at this time.
Methodologies for addressing missing data in classification tasks must be rigorously evaluated in light of the rapidly expanding field of healthcare informatics. Using the PIMA Indian Diabetes dataset, this research provides a thorough analysis of data imputation methods related to diabetes classification. We evaluate four popular imputation techniques: Multivariate Imputation by Chained Equations (MICE), k-Nearest Neighbours (KNN), Mean, and Median. These techniques are applied to a variety of machine learning classifiers including Decision Trees, Random Forest, Support Vector Classifier (SVC), and Gaussian Naive Bayes Classifier. Our objective is to provide an understanding of how these techniques influence the predictive accuracy of classifiers in the context of diabetes diagnosis.
Jain et al. (Sat,) studied this question.