What question did this study set out to answer?

This research aims to compare various explainable AI techniques for interpreting CNN decisions in Parkinson’s disease diagnostic tests.

March 26, 2026Open Access

Comparative Evaluation of XAI Techniques for CNN Interpretation in Parkinson’s Drawing Test Classification

Key Points

This research aims to compare various explainable AI techniques for interpreting CNN decisions in Parkinson’s disease diagnostic tests.
Integrated XAI techniques with CNN models for motor assessment classification.
Evaluated ScoreCAM, GradCAM, AblationCAM, SHAP, and others using ConvNeXtV2 architecture.
Assessed explanations quantitatively with infidelity metrics and qualitatively through expert interpretation and visualizations.
Achieved an accuracy of 0.90 and F1 score of 0.89 in classification tasks.
Found varying mean infidelity values among techniques, ranging from 5e-08 to 5e-02.
AblationCAM and GradCAM provided clinically plausible heatmaps with low infidelity (<1e-03) in spiral drawings.
SHAP delivered detailed stroke-specific insights in line tests, though with higher infidelity.

Abstract

This study explores the integration of explainable artificial intelligence (XAI) techniques with convolutional neural networks (CNNs) to classify digitised motor assessments, specifically, Archimedean spiral and line drawing tests, for Parkinson’s disease (PD) diagnostics. Although CNNs have achieved high accuracy in analyzing motor patterns, their opaque decision processes limit clinical trust and adoption. To address this, we evaluated post hoc explanation methods including ScoreCAM, GradCAM, AblationCAM, SHAP and others in combination with the ConvNeXtV2 architecture, which achieved strong classification performance (accuracy 0.90, F1 score 0.89). Explanations were assessed both quantitatively, using the infidelity metric, and qualitatively through visualisation and expert interpretation. Our results revealed a wide range of mean infidelity values across methods, from 5e-08 to 5e-02. Methods like AblationCAM and GradCAM produced compact, clinically plausible heatmaps with consistently low infidelity scores (e.g., <1e-03), especially in spiral drawings where attention aligned with distortion-prone regions. SHAP offered more granular, stroke-specific attributions in line tests, despite slightly higher infidelity. In contrast, ScoreCAM yielded spatially diffuse activations with poor fidelity, often highlighting non-informative areas. While several methods generated visually interpretable outputs, inconsistencies across techniques and limited overlap with clinically relevant motor features highlighted persistent gaps. No method reliably focused on full stroke trajectories or regions typically affected by PD, such as areas showing micrographia or tremor. These findings suggest that current XAI techniques, although helpful in analyzing localised model behavior, fall short of producing explanations that are both faithful and clinically meaningful. To support real-world adoption in neurological diagnostics, future XAI approaches must go beyond attribution heatmaps, incorporating domain knowledge, modeling global motor patterns, and aligning outputs with known disease mechanisms.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Alawode et al. (Thu,) studied this question.

synapsesocial.com/papers/69c4cd65fdc3bde448919bc9 https://doi.org/https://doi.org/10.1016/j.procs.2026.03.094

Bookmark

View Full Paper