What question did this study set out to answer?

This review aims to establish a systematic benchmark for machine learning applications in EEG-based dementia diagnosis using the AHEPA dataset.

May 20, 2026Open Access

The AHEPA EEG benchmark: setting the standard for machine learning in dementia diagnosis, a scoping review

Key Points

This review aims to establish a systematic benchmark for machine learning applications in EEG-based dementia diagnosis using the AHEPA dataset.
Reviewed 46 studies using the AHEPA dataset, stratifying them into three validity tiers.
Evaluated methodologies including subject-level validation and epoch-level cross-validation.
Analyzed performance metrics and accuracy across various machine learning approaches.
AD classification accuracy decreased from 90.81% to 82.11% in Validity-1 studies; for FTD, accuracy fell from 86.53% to 75.18%.
Weak validation protocols correlated with a systematic increase in reported accuracy by 7-10 percentage points.
Deep and hybrid models showed high accuracies, but traditional algorithms performed similarly under proper validation, highlighting issues of data leakage.

Abstract

Abstract Accurate and reproducible electroencephalography (EEG)-based classification of dementia remains a key challenge in computational neurodiagnostics. The open-access AHEPA dataset has become the most commonly used benchmark for Alzheimer’s disease (AD) and Frontotemporal dementia (FTD) classification, yet reported results vary widely due to methodological inconsistencies. This study presents the first systematic and quantitative benchmark review of all published machine learning approaches applied to the AHEPA dataset. Forty-six studies were reviewed and stratified into three validity tiers, with Validity 1 representing the highest methodological rigor and Validity 3 the lowest.According to their evaluation rigor: (1) subject-level validation (e.g., Leave-One-Subject-Out cross-validation, LOSO-CV), (2) subject-level train/test splits, and (3) epoch-level k-fold cross-validation. Performance metrics were normalized across classification problems. The analysis revealed that methodological rigor is inversely correlated with reported accuracy: for AD versus Cognitively Normal controls, mean accuracy decreased from 90.81% overall to 82.11% in Validity-1 studies; for FTD versus controls, accuracy dropped from 86.53% to 75.18%. Linear regression analyses demonstrated that weaker validation protocols were associated with systematic increases of 7–10% points in reported accuracy, explaining more than half of the observed performance variance. Deep and hybrid models reported the highest nominal accuracies, but under proper validation, traditional algorithms performed comparably, indicating that data leakage often drives apparent improvements. The review also highlights the lack of cross-configuration generalization and the urgent need for adaptive, montage-independent methodologies. Overall, this benchmark establishes the first reproducible reference framework for EEG-based dementia classification on the AHEPA dataset, providing quantitative baselines and validity criteria against which all future studies should be evaluated.

Bookmark

View Full Paper