Diabetes mellitus is a prevalent and serious chronic metabolic disease, and its global prevalence continues to rise, posing significant challenges to healthcare systems and economies. Early prediction and risk assessment of diabetes are crucial for timely clinical intervention, individualized treatment, and public health decision-making. While machine learning methods have made considerable progress in diabetes prediction, single models often struggle to balance predictive accuracy and robustness. Ensemble learning approaches, particularly stacking, have been shown to improve performance by leveraging the complementary strengths of multiple base learners. However, conventional stacking methods typically employ fixed weights or rely solely on a meta-learner, without fully accounting for sample-level differences in prediction uncertainty. This study is based on a publicly available diabetes dataset from Kaggle, consisting of 2,768 samples. The dataset was divided into training and testing sets at a 7:3 ratio and preprocessed using z-score normalization. We compared the performance of ten commonly used machine learning models (LR, SVM, KNN, NB, MLP, RF, DT, AdaBoost, XGBoost, and LightGBM), traditional stacking, and the proposed Sample-level Entropy-weighted Stacking method (SLE-Stacking). SLE-Stacking employs information entropy to quantify the predictive uncertainty of base learners at the individual sample level and dynamically assigns fusion weights accordingly, thereby achieving more robust integration. Experimental results show that SLE-Stacking outperformed both single models and traditional stacking across multiple evaluation metrics, including accuracy, precision, recall, F1-score, and AUC. Specifically, SLE-Stacking achieved an accuracy of 0.987, a precision of 0.993, an F1-score of 0.981, and an AUC of 0.990, with particularly notable improvements in F1-score and AUC. The proposed SLE-Stacking method effectively enhances the robustness and generalization capability of diabetes prediction models and provides a feasible new approach for the application of medical artificial intelligence in chronic disease risk assessment and auxiliary diagnosis.
De Zhang (Sat,) studied this question.