Key points are not available for this paper at this time.
Due to its high prevalence and incidence, diabetes is considered significant public health. Since diabetes has no known cure, early diagnosis plays a vital role in effectively managing the disease. Feature scaling is a vital step in pre-processing data before building a model using machine learning. The datasets used for model training in machine learning often contain unpredictable values that may have varying scales. This can result in inequalities in comparing these values. Feature scaling techniques can address these challenges by adjusting the values and promoting easy and fair comparisons among values. This study aims to evaluate the impact of normalization, standardization, and no feature scaling on the performance of five machine learning models in diagnosing diabetes. The machine learning algorithms implemented for this study include random forest, naive Bayes, k-nearest neighbor (KNN), logistic regression, and support vector machine (SVM). These algorithms support supervised learning. Furthermore, several open-source frameworks and libraries were implemented. They include; Jupyter notebook, SkLearn, Pandas, NumPy, Matplotlib, and seaborn. The result obtained from the study indicates that the random forest model performed significantly well without implementing any feature scaling technique. This contrasts with the KNN and SVM model, which performed better when the normalization technique was implemented. Also, the naive Bayes model shows no changes when either standardization, normalization, or no feature scaling was implemented. This study concludes that not all model requires feature scaling techniques to be applied to the dataset to achieve optimal performance. Furthermore, distance-based and gradient descent algorithms previously thought to be sensitive to feature scaling may not necessarily be true, as indicated by the outcome of this study. Finally, feature scaling techniques significantly impact some models while others do not.
Ozsahin et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: