What is the clinical evidence from this study?

Study design: Cohort. Population: Type 2 Diabetes Mellitus (n=3365464). Intervention: EHR-based T2DM prediction model. Primary outcome: 1-year risk of incident T2DM (AUC 0.883, 95% CI 0.880-0.886).

What does this research mean for the field?

An EHR-based machine learning prediction model using a hazard-based Super Learning approach accurately predicts 1-, 3-, and 10-year risk of incident Type 2 Diabetes with excellent discrimination and calibration. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to create a reliable prediction model for T2DM using electronic health records to improve prevention efforts.

June 7, 2026

2321-P: Machine-Learning Modeling for T2DM Prediction in over 3 Million Adults

Resultado clave

An EHR-based prediction model predicted 1-year risk of incident T2DM with an AUC of 0.883 (95% CI 0.880-0.886) in the validation cohort.

Puntos clave

The aim is to create a reliable prediction model for T2DM using electronic health records to improve prevention efforts.
Conducted a retrospective cohort study with over 3 million adults, aged 18-70.
Applied a hazard-based Super Learning approach to predict T2DM risk at various follow-up intervals.
Utilized a range of predictors including demographics, clinical measures, and novel factors.
T2DM incidence was 10.7 per 1,000 person-years during a median follow-up of 5.4 years.
The model achieved an AUC of 0.886 (95% CI: 0.883-0.888) for 1-year risk in training and 0.883 (95% CI: 0.880-0.886) in validation.
Sensitivity was 80% and specificity 81% at the optimal cut-point (>1.2% risk) for identifying high-risk individuals.

Diseño del estudio

Tipo

Cohort (n=3,365,464)

PICO estructurado

Población

3,365,464 adults aged 18-70 years receiving care at Kaiser Permanente Northern California, followed for a median of 5.4 years to develop and validate a T2DM prediction model.

Exposición

EHR-based machine-learning prediction model (hazard-based Super Learning approach)

Resultado

Incident T2DM at 1-, 3-, and 10-year follow-up

An EHR-based machine-learning model demonstrated excellent discrimination and calibration for predicting 1-, 3-, and 10-year risk of incident T2DM in a large real-world cohort.

Resultado numérico

Estimación del efecto: AUC 0.883 (95% CI 0.880-0.886)

Resumen

Introduction and Objective: Over 60% of U.S. adults have risk factors for T2DM, complicating scale-up and sustainability of evidence-based prevention efforts. We developed an EHR-based T2DM prediction model to facilitate real-world implementation. Methods: We conducted a retrospective cohort study among adults aged 18-70 years receiving care at Kaiser Permanente Northern California from 2012-2024, followed until T2DM onset, death, disenrollment, or Dec 31, 2024. The cohort (N=3,365,464) was randomly split 70:30 for training and validation. We applied a hazard-based Super Learning approach to predict 1-, 3-, and 10-year risk. Incident T2DM was defined using diagnosis codes, glycemic test values, or T2DM medication fills; adults prescribed only metformin, SGLT2, or GLP-1s without a diagnosis code, lab or another T2DM medication were not classified as T2DM. Predictors included demographics, clinical measures, lifestyle factors, comorbidities, prescriptions, utilization, and novel predictors (MASLD and neighborhood-level measures of SES, walkability, and food environment). Results: Median age was 39 years (IQR: 28-53), and 55% were female. During a median follow-up of 5.4 years, T2DM incidence was 10.7/1,000 person-years. Within 1-year follow-up, the predictive model achieved an AUC of 0.886 (95% CI: 0.883-0.888) in training and 0.883 (95% CI: 0.880-0.886) in validation, with near-ideal calibration (mean predicted risk 1.03% vs observed 1.01%; slope 1.26). At the optimal cut-point (1.2% risk) identifying the top two deciles of high risk, sensitivity was 80%, specificity 81%, and number needed to evaluate 25. Results were consistent for 3-, and 10-year follow-up. Conclusion: This EHR-based prediction model, developed and validated in over 3 million adults, demonstrated excellent discrimination and calibration. It can support clinicians in identifying patients for T2DM prevention programs, pharmacologic interventions, and enable efficient recruitment for intervention studies. Current work focuses on external validation and future integration into clinical workflows. Disclosure L.A. Rodriguez: None. M.M. Yassin: None. R. Neugebauer: None. T.R. Levin: Research Support; Current; Freenome, Inc. Advisory Panel; Current; Geneoscopy, Navatar. A. Gopalan: None. V. Saxena: None. J. An: Research Support; Current; Bayer AG, AstraZeneca. Research Support; Ended; Merck Current; Gilead Sciences, Inc. Funding National Institute of Diabetes and Digestive and Kidney Diseases (5K01DK138122 and 1P30DK092924).

Me gusta

Guardar