Diabetes mellitus is a global health challenge which is associated with various complications such as cardiovascular disease, vision impairment, and kidney failure. Therefore, early detection and accurate prediction of diabetes risk play a significant role in improving the management of the disease and minimising the long-term health complications. Individual machine learning methods that have been applied exhibit various limitations, such as overfitting, which negatively influence the performance due to reduced generalisation capability and high variance, making the model more sensitive to specific data features. The study aimed to solve this issue by applying a stacked ensemble learning technique in enhancing the classification performance of diabetes using the Pima Indian Diabetes Data. The study incorporated various base learners: Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbours (KNN), Gradient–Boosting Machine (GBM) and Logistic regression as a meta-learner. The base models were trained using a 10-fold cross-validation approach to ensure a robust model and minimise overfitting. The study showed that the stacked ensemble technique achieved an average AUC of 0.84 and a standard deviation of 0.05 across all folds, showing a stable predictive performance. To improve on interpretability SHapley Additive exPlanations (SHAP) analysed the contribution of individual features, such as Glucose and Body Mass Index (BMI), which were influential in predicting diabetes risk. Further, the SHAP analysed the contribution of base learners to meta-learner prediction and found Gradient Boosting and Random Forest exerted stronger influence on the stacked ensemble compared to others. Overall, the stacking ensemble provided a robust and reliable approach for an improved diabetes classification performance. Furthermore, the integration of explainable artificial intelligence, such as SHAP, improves model transparency and interpretability among healthcare professionals.
Macharia et al. (Wed,) studied this question.