Background : Electroencephalography (EEG) interpretation for epilepsy diagnosis faces persistent challenges including specialist shortages, variable interpretation accuracy, and limited accessibility. Artificial intelligence (AI)-based automated interpretation systems promise to address these limitations, yet their diagnostic performance compared to clinical experts remains incompletely characterized. Objective: To systematically evaluate and meta -analyze the diagnostic accuracy of AI-enabled EEG interpretation compared with human clinical experts for epilepsy detection. Methods : We conducted a systematic review following PRISMA-DTA and Cochrane MECIR standards, searching PubMed, Scopus, IEEE Xplore, and Google Scholar through June 2025. Studies directly comparing AI algorithms with human expert interpretation using identical EEG datasets were included. Quality assessment employed QUADAS-AI criteria. Bivariate meta -analysis was performed to estimate pooled sensitivity, specificity, diagnostic odds ratios, and likelihood ratios. It is important to note that AI algorithms evaluated in these studies were trained using expert-labeled EEG data through supervised learning approaches; our comparison focuses on diagnostic accuracy outcomes rather than algorithmic independence from expert knowledge. Results : Three studies encompassing 9,775 EEG examinations met inclusion criteria. AI demonstrated superior pooled sensitivity (0.83–0.85 vs. 0.77–0.80) and specificity (0.83–0.85 vs. 0.72–0.75) compared to human experts. The diagnostic odds ratio for AI was approximately double that of humans (12–13 vs. 6–7). AI exhibited consistently narrower confidence intervals, indicating greater interpretive reliability. For normal versus abnormal EEG classification, AI achieved 86–90% sensitivity with enhanced consistency compared to human evaluators. However, substantial heterogeneity (I 2 > 75%) and methodological limitations were identified across studies. Conclusions: AI-based EEG interpretation demonstrates diagnostic performance equal or superior to human experts with enhanced consistency, supporting potential implementation as clinical triage tools. However, limited transparency, patient selection bias, and deployment feasibility constraints warrant further investigation before widespread clinical adoption. Multiple sources of uncertainty affect AI-based diagnostic systems in clinical applications. Internal uncertainties include model parameter uncertainty, threshold selection variability, and training data limitations. External uncertainties encompass population heterogeneity, EEG acquisition variability, and clinical context differences. Parametric uncertainties arise from model architecture choices, while non-parametric uncertainties reflect distribution-free variations in real-world data. The amounts and structures of these uncertainties are often unknown in operational settings, necessitating robust uncertainty quantification methods for safe clinical deployment.
Reis et al. (Wed,) studied this question.