This study explores the integration of explainable artificial intelligence (XAI) techniques with convolutional neural networks (CNNs) to classify digitised motor assessments, specifically, Archimedean spiral and line drawing tests, for Parkinson’s disease (PD) diagnostics. Although CNNs have achieved high accuracy in analyzing motor patterns, their opaque decision processes limit clinical trust and adoption. To address this, we evaluated post hoc explanation methods including ScoreCAM, GradCAM, AblationCAM, SHAP and others in combination with the ConvNeXtV2 architecture, which achieved strong classification performance (accuracy 0.90, F1 score 0.89). Explanations were assessed both quantitatively, using the infidelity metric, and qualitatively through visualisation and expert interpretation. Our results revealed a wide range of mean infidelity values across methods, from 5e-08 to 5e-02. Methods like AblationCAM and GradCAM produced compact, clinically plausible heatmaps with consistently low infidelity scores (e.g., <1e-03), especially in spiral drawings where attention aligned with distortion-prone regions. SHAP offered more granular, stroke-specific attributions in line tests, despite slightly higher infidelity. In contrast, ScoreCAM yielded spatially diffuse activations with poor fidelity, often highlighting non-informative areas. While several methods generated visually interpretable outputs, inconsistencies across techniques and limited overlap with clinically relevant motor features highlighted persistent gaps. No method reliably focused on full stroke trajectories or regions typically affected by PD, such as areas showing micrographia or tremor. These findings suggest that current XAI techniques, although helpful in analyzing localised model behavior, fall short of producing explanations that are both faithful and clinically meaningful. To support real-world adoption in neurological diagnostics, future XAI approaches must go beyond attribution heatmaps, incorporating domain knowledge, modeling global motor patterns, and aligning outputs with known disease mechanisms.
Alawode et al. (Thu,) studied this question.