What question did this study set out to answer?

The aim is to evaluate if machine learning models can aid in the timely identification of dementia cases using health data.

April 10, 2026Open Access

Advancing dementia identification using machine learning and real‐world sequential health data

Key Points

The aim is to evaluate if machine learning models can aid in the timely identification of dementia cases using health data.
Evaluated de-identified datasets of 8195 dementia cases and 8195 matched controls.
Developed four machine learning models with cross-sectional and longitudinal data features.
Assessed models for performance metrics such as area under the curve, sensitivity, and specificity.
The best model achieved an area under the curve of 0.86 and sensitivity of 73.3%.
Specificity was noted at 87.5%, indicating strong accuracy in identifying cases.
Key predictors for dementia included ICD-10 codes, health-care use, and residency referral data.

Abstract

Abstract Introduction Timely identification of dementia remains a major clinical challenge globally, with many cases being unrecognized. This study evaluated whether using machine learning models with routinely collected health data can support dementia case finding. Method De‐identified datasets were used to create a nested case–control of 8195 individuals with dementia and 8195 matched controls. Four models incorporating both cross‐sectional and longitudinal features were developed and tested. Results The best‐performing model achieved an area under the curve of 0.86 (95% confidence interval CI: 0.84–0.87), with sensitivity of 73.3% (95% CI: 72.3–74.2) and specificity of 87.5% (95% CI: 86.7–88.2). Key predictors included time‐stamped International Classification of Diseases 10th Revision diagnostic codes, health‐care use, referral to aged residential care, and hospital delirium assessments. Discussion A unique feature was the inclusion of “timestamp data” that allowed us to assess the longitudinal changes which may have improved performance of the model. These findings demonstrate the potential for using machine learning with routine health data to enhance early dementia detection. Highlights Machine learning models using routine health data in a real‐world setting in New Zealand had good accuracy for identification of dementia (area under the curve 0.86, sensitivity 73.3%, and specificity 87.5%). Longitudinal sequential “timestamp” data is a unique feature that improved the performance of machine learning models for dementia case finding. Key predictors included International Classification of Diseases 10th Revision codes, health‐care use, referrals to aged residential care, and positive delirium scores.

Advancing dementia identification using machine learning and real‐world sequential health data

Key Points

Abstract

Cite This Study