August 14, 2025

Diabetes Disease Prediction Using Machine Learning Classification Algorithms

Key Points

Logistic Regression achieved the highest accuracy of 97.53%, outperforming all other machine learning algorithms.
The model utilized nine algorithms and feature selection techniques, ensuring efficient and accurate predictions from the data.
Assessment used 5-fold cross-validation with metrics such as Accuracy, Precision, and ROC-AUC to evaluate model performance.
The findings underline the importance of machine learning in improving diabetes care in resource-limited settings.

Abstract

This study develops a diabetes prediction model utilizing machine learning (ML) algorithms in conjunction with feature selection techniques to enhance predictive accuracy and computational efficiency. The dataset, sourced from 2023 diabetes screening records provided by the Yala Provincial Public Health Office in Thailand, was preprocessed using Min-Max and Z-Score normalization methods to ensure consistency. Nine ML algorithms were evaluated: Logistic Regression (LR), Decision Trees (DT), Classification and Regression Trees (CART), Bayesian Classifiers, K-Nearest Neighbors (KNN), Random Forest (RF), Multilayer Perceptron (MLP), XGBoost, CatBoost, and Gradient Boosting Machines (GBM). Model performance was assessed using 5-fold cross-validation and evaluated based on Accuracy, Precision, Recall, F1-Score, and ROC-AUC. Logistic Regression outperformed all other models, achieving an accuracy of 97.53%, a precision of 97.53%, a recall of 100%, and an F1-Score of 98.75%. Additionally, the study explored feature selection techniques, specifically Principal Component Analysis (PCA) and Random Forest, to reduce computational complexity while maintaining performance. Random Forest demonstrated greater efficiency than PCA in feature selection. These findings highlight the effectiveness of machine learning (ML) models in developing scalable, real-time diabetes screening systems. The proposed approach is particularly beneficial in resource-limited healthcare settings where traditional diagnostic tools may be inaccessible. By enhancing clinical decision-support systems, this research makes a practical contribution to improving diabetes care, enabling earlier interventions, and facilitating more accurate risk stratification for both patients and healthcare providers.

Ask AI

Helpful

Bookmark

View Full Paper