Status: Submitted version (preprint). Currently under review at Machine Learning (Springer Nature, journal 10994). Submission ID: 8f29c533-4613-48a7-9cbf-1f4eddcc83fa. Abstract. Post-hoc attribution methods are widely deployed to explain deep vision classifiers, yet no systematic evaluation protocol exists for the corrupted-input regime. This paper introduces the first such protocol for attribution stability under distribution shift, validated through a factorial audit of five attribution methods (Integrated Gradients, Grad-CAM, SmoothGrad, GradientSHAP, LIME) under fifteen corruptions at five severity levels on CIFAR-10-C and CIFAR-100-C, across two architecturally distinct classifiers (ResNet-50, ViT-B/16), yielding 760, 000 clean-corrupted pairs and five stability metrics. Key findings. (1) Attribution stability declines monotonically with corruption severity for all methods (12 to 13 of 15 corruption types significant under Benjamini-Hochberg correction). (2) Degradation depends dramatically on method: SmoothGrad retains Spearman 0. 91 for brightness at severity 3 while LIME falls to 0. 04 (twenty-fold gap). (3) The resolution-fair ranking (SmoothGrad, IG, GradientSHAP) is consistent across 96% of cells (144/150) with method eta² > 0. 84 dwarfing architecture eta² < 0. 02. (4) The architecture-by-method interaction is non-significant on CIFAR-10 (p = 0. 71) but significant on CIFAR-100 (p = 0. 035), revealing task-complexity modulation of the model-agnostic claim. Files. (a) paper2ₘanuscriptₚreprint. pdf: the submitted manuscript (42 pages, compiled with Springer Nature sn-jnl. cls). (b) paper2ₗatexₛource. zip: full LaTeX source bundle for compilation reproducibility. Reproducibility code (separate record). The analysis code and aggregated statistics are archived at 10. 5281/zenodo. 19689329 (v1. 1. 0). License. CC BY 4. 0. Note on peer review. This preprint is the submitted version prior to peer review. If accepted, a revised Author Accepted Manuscript will be uploaded as a new version after any applicable Springer Nature embargo.
Minyeong Kim (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: