What question did this study set out to answer?

This study aims to evaluate the robustness and interpretability of machine learning models in diagnosing Alzheimer’s disease using overlapping clinical predictors.

May 15, 2026

Predictive overlap, feature reduction, and robustness in Machine Learning-based cross-sectional classification of Alzheimer’s disease

Key Points

This study aims to evaluate the robustness and interpretability of machine learning models in diagnosing Alzheimer’s disease using overlapping clinical predictors.
Compared five machine learning models: logistic regression, SVM, XGBoost, ridge classifier, and KNN.
Utilized 35,635 samples from the National Alzheimer’s Coordinating Center Uniform Data Set for cross-sectional diagnosis.
Conducted three experiments: baseline modeling, feature reduction to 13 variables, and Gaussian noise perturbation.
XGBoost showed the highest mean accuracy and best calibration in performance metrics.
After feature reduction, performance declined modestly while maintaining algorithm rankings.
KNN was the most sensitive model to synthetic noise perturbation.

Abstract

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder with substantial clinical and economic burden, making accurate and interpretable diagnosis classification an important goal. While beta amyloid and tau are central biomarkers, broader clinical datasets often contain partially overlapping predictors, raising questions about predictive overlap, interpretability, and robustness in machine learning (ML)-based classification. This study compared five ML models—logistic regression, support vector machine (SVM), extreme gradient boosting (XGBoost), ridge classifier, and k-nearest neighbors (KNN)—for cross-sectional AD diagnosis classification using 35,635 samples from the National Alzheimer’s Coordinating Center (NACC) Uniform Data Set (UDS). Performance was evaluated across 50 repeated stratified splits using standard classification metrics. Predictive overlap was assessed using out-of-fold (OOF) single-feature Spearman correlation heatmaps, with supplementary descriptive metrics including mean pairwise absolute difference (MPAD), mean within-sample variance, and root mean square pairwise distance (RMSPD). Three experiments were conducted: baseline modeling, feature reduction to 13 variables, and Gaussian noise perturbation. XGBoost achieved the strongest overall performance, with the highest mean accuracy, strongest discrimination, and best calibration. After feature reduction, performance declined only modestly while preserving the overall ranking of algorithms. Under synthetic perturbation, most models remained stable, with KNN showing the greatest sensitivity to noise. Overall, the results suggest that predictive signal is distributed across partially substitutable features and that smaller, more interpretable feature sets can retain substantial classification performance in cross-sectional AD diagnosis. These findings further motivate future research examining whether redundancy patterns may provide additional insight into how AD-related classification signals are distributed across broader layers of clinical features.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predictive overlap, feature reduction, and robustness in Machine Learning-based cross-sectional classification of Alzheimer’s disease

Key Points

Abstract

Cite This Study