What question did this study set out to answer?

To establish a protocol for evaluating attribution methods' stability amid input corruption in vision classifiers.

April 30, 2026Open Access

Systematic Vulnerability Audit of Post-Hoc XAI under Common Corruptions: A Factor Analysis Across Vision Benchmarks

Key Points

To establish a protocol for evaluating attribution methods' stability amid input corruption in vision classifiers.
Introduced a systematic evaluation protocol for attribution stability
Conducted a factorial audit of five attribution methods
Assessed methods under fifteen corruptions across two classifiers
Analyzed 760,000 clean-corrupted pairs
Employed five stability metrics for evaluation.
Attribution stability declines with corruption severity for all methods
SmoothGrad maintains higher stability compared to LIME under certain conditions
Resolution-fair ranking shows consistency across tests
Architecture-by-method interactions exhibit significant variation based on complexity.

Abstract

Status: Submitted version (preprint). Currently under review at Machine Learning (Springer Nature, journal 10994). Submission ID: 8f29c533-4613-48a7-9cbf-1f4eddcc83fa. Abstract. Post-hoc attribution methods are widely deployed to explain deep vision classifiers, yet no systematic evaluation protocol exists for the corrupted-input regime. This paper introduces the first such protocol for attribution stability under distribution shift, validated through a factorial audit of five attribution methods (Integrated Gradients, Grad-CAM, SmoothGrad, GradientSHAP, LIME) under fifteen corruptions at five severity levels on CIFAR-10-C and CIFAR-100-C, across two architecturally distinct classifiers (ResNet-50, ViT-B/16), yielding 760, 000 clean-corrupted pairs and five stability metrics. Key findings. (1) Attribution stability declines monotonically with corruption severity for all methods (12 to 13 of 15 corruption types significant under Benjamini-Hochberg correction). (2) Degradation depends dramatically on method: SmoothGrad retains Spearman 0. 91 for brightness at severity 3 while LIME falls to 0. 04 (twenty-fold gap). (3) The resolution-fair ranking (SmoothGrad, IG, GradientSHAP) is consistent across 96% of cells (144/150) with method eta² > 0. 84 dwarfing architecture eta² preprint option (removes "Under review as submission to TMLR" header) Author info now displayed: Minyeong Kim, Independent Researcher, Gyeonggi-do, South Korea Manuscript content unchanged from v2; only author-display mode changed for accurate preprint distribution on Zenodo / arXiv / ResearchGate / Academia. edu v2 (2026-04-28) — TMLR submission revision Key improvements over v1: §3. 2 Datasets: Added pre-emptive justification of CIFAR-10/100-C scope choice §5. 1. 1 LIME: Reframed sign-reversal as a deployment-relevant finding §7 Conclusion: Future work expanded with Cohen 2019 / Angelopoulos / Heskes §10 Statements: AI declaration restructured into explicit dual-list §2. 7 Related Work: Added SAE interpretability discussion (Cunningham 2023, Templeton 2024) §3. 5 Methodology: Added 6-step pipeline pseudocode All Tables: Added directional arrows for notation clarity Bibliography: 68 references (added Cunningham 2023, Templeton 2024) HuggingFace checkpoint URLs moved to supplementary material for double-blind Total: 35 pages, 934 KB PDF.

Systematic Vulnerability Audit of Post-Hoc XAI under Common Corruptions: A Factor Analysis Across Vision Benchmarks

Key Points

Abstract

Cite This Study