What question did this study set out to answer?

This review assesses the performance of machine learning (ML) models in predicting Type 1 diabetes (T1D) onset and early outcomes.

April 25, 2026Open Access

Machine Learning Models for Early Prediction of Type 1 Diabetes: A Systematic Review

Key Points

This review assesses the performance of machine learning (ML) models in predicting Type 1 diabetes (T1D) onset and early outcomes.
Conducted structured search in multiple databases including PubMed and Scopus for studies between 2021 and 2025.
Included studies that developed or validated ML models specifically for T1D prediction or early detection.
Risk of bias was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST).
Fourteen studies included, with participant sample sizes ranging from 32 to over 800,000.
ML approaches demonstrated varied performance (AUROC 0.73-0.92), with prediction horizons spanning from minutes to years.
Only three studies performed external validation, raising concerns about model generalizability.

Abstract

Type 1 diabetes (T1D) is a chronic autoimmune condition with a rising global incidence. Early prediction of disease onset and detection of preclinical progression are critical for timely intervention. Machine learning (ML) offers the ability to analyze complex, high-dimensional data and may improve risk prediction across different stages of T1D development. This systematic review evaluates the application and performance of ML models for predicting T1D onset and early disease-related outcomes. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a structured search was conducted in PubMed, British Medical Journals, Scopus, IEEE Xplore, and Web of Science for studies published between 2021 and 2025. Eligible studies included those that developed or validated ML models for T1D prediction or early detection. Study selection, data extraction, and risk of bias assessment (using Prediction model Risk of Bias Assessment Tool (PROBAST)) were performed, and findings were synthesized narratively due to heterogeneity in study design, populations, prediction targets, and outcome measures. Fourteen studies were included, with sample sizes ranging from 32 to over 800,000 participants. ML approaches included logistic regression, random forests, support vector machines, and gradient boosting methods. Reported performance varied (area under the receiver operating characteristic curve (AUROC) 0.73-0.92), with prediction horizons spanning short-term outcomes (minutes to hours) to long-term disease onset (up to 10 years). However, study heterogeneity was substantial, and only three studies performed external validation. While most studies were rated as low risk of bias, several high-performing models were based on small samples or limited validation, raising concerns about overfitting and generalizability. ML models demonstrate potential for improving prediction of T1D onset and early disease-related outcomes, but current evidence is limited by variability in methods, inconsistent validation, and uncertain clinical applicability. Future research should prioritize large, prospective, and externally validated studies, with greater emphasis on model transparency, generalizability, and real-world implementation.

Bookmark

View Full Paper

Bookmark

View Full Paper

Machine Learning Models for Early Prediction of Type 1 Diabetes: A Systematic Review

Key Points

Abstract

Cite This Study