Explainable Artificial Intelligence (XAI) has become a critical requirement for the responsible deployment of deep learning systems in safety-critical and regulated domains, particularly in medical imaging. In computer vision, gradient-based explanation methods such as Saliency Maps and Gradient-weighted Class Activation Mapping (Grad-CAM) are widely used for interpreting convolutional neural networks (CNNs). However, the increasing adoption of Vision Transformers (ViTs) introduces structural differences in internal representations that challenge the direct transfer of convolutional explainability mechanisms. This study presents a systematic, quantitative, and statistically validated evaluation of gradient-based visual explainability across CNN architectures (VGG16 and ResNet50) and a Vision Transformer (ViT-B/16), using both a domain-specific medical imaging dataset (brain MRI, tumor vs. non-tumor classification). Beyond qualitative heatmap inspection, we conduct deletion-based faithfulness analysis, sensitivity-to-noise evaluation, feature masking validation, and statistical hypothesis testing over 30 independent runs. All models achieve strong predictive performance on the domain dataset (mean accuracy ≈ 0.99), enabling a fair and meaningful comparison of explanation methods across architectures. Results demonstrate that explanation reliability is highly method- and architecture-dependent. Sensitivity differences are consistently statistically significant, whereas deletion-based faithfulness does not always yield equally strong separation under the adopted masking protocol. Masking-based analysis reveals substantial false-positive rates in certain configurations, indicating that visually plausible heatmaps do not necessarily isolate decision-necessary evidence. These findings underscore the importance of coupling visual explanations with behavioral validation metrics, particularly in high-risk domains governed by emerging regulatory frameworks such as the EU AI Act. Overall, the study advocates for empirically validated, architecture-aware, and statistically grounded approaches to medical XAI.
Tzirtis et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: