What question did this study set out to answer?

The study aims to improve the classification performance of diabetes risk using a stacked ensemble learning technique.

June 15, 2026Open Access

Improved Stacked Ensemble Technique in Enhancing the Classification of Diabetes Mellitus Patients

Key Points

The study aims to improve the classification performance of diabetes risk using a stacked ensemble learning technique.
Utilized Pima Indian Diabetes Data for analysis
Applied various machine learning models including SVM, RF, DT, KNN, GBM, and logistic regression
Implemented 10-fold cross-validation to minimize overfitting
Achieved an average AUC of 0.84 with a standard deviation of 0.05
Gradient Boosting and Random Forest exerted greater influence on predictions compared to other models
Integrated SHAP analysis improved interpretability of model predictions

Abstract

Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.

Mark Helpful

Bookmark

Relay

View Full Paper