Managing blood glucose in type 1 diabetes (T1D) remains a daily clinical challenge, and accurate short-term prediction of glucose levels can meaningfully improve insulin dosing decisions while reducing the risk of dangerous hypoglycaemic episodes. Although numerous machine learning approaches have been proposed for this task, comparing their relative merits is difficult because published studies differ widely in datasets, preprocessing choices, and evaluation criteria. In this work, we address this research gap by benchmarking ten machine learning methods—from a naïve persistence baseline through classical linear regressors, gradient-boosted ensembles, and recurrent neural networks to a novel hybrid that couples LightGBM with stochastic differential equation (SDE)-based glucose–insulin simulation—on two multi-patient datasets comprising 34 T1D subjects, across prediction horizons of 15, 30, 60, and 120 min. Every method is trained and tested under identical preprocessing and temporal splitting conditions to ensure a fair comparison. The proposed Hybrid LightGBM-SDE model consistently outperforms all alternatives, recording RMSE values of 22.42 mg/dL at 15 min, 28.74 mg/dL at 30 min, 33.89 mg/dL at 60 min, and 37.22 mg/dL at 120 min—an improvement of between 13.6% and 27.0% relative to standalone LightGBM. At the clinically important 30 min horizon, 99.7% of predictions lie within the acceptable A and B zones of the Clarke Error Grid. Wilcoxon signed-rank tests confirm that performance differences are statistically significant (p < 10−10), and SHAP-based analysis shows that the SDE-derived simulation features are among the most influential predictors, especially at longer horizons. All source code and evaluation scripts are publicly released to support reproducibility. Due to temporary data access constraints, all experiments reported here use physics-based synthetic datasets generated from the Bergman minimal model, replicating the structural properties of the D1NAMO and HUPA-UCM collections; validation on the original clinical recordings is planned. Among the two synthetic datasets, the D1NAMO-equivalent cohort (nine patients) proves more challenging, with systematically higher per-patient RMSE variance. The clinically acceptable prediction accuracy at the 30 min horizon (99.7% in Clarke zones A + B) suggests potential for integration into insulin dosing decision-support systems.
Kolev et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: