This work investigates cross-model explanation disagreement conditional on prediction agreement, termed the “Explanation Lottery.” We conduct a large-scale empirical study across 24 datasets and multiple model classes, demonstrating substantial variability in feature attributions despite identical predictions. We propose a reliability score to quantify explanation stability and validate findings through extensive statistical analysis. This manuscript is a preprint of research currently under review at Transactions on Machine Learning Research (TMLR). Full experimental pipeline and implementation details are available from the authors upon reasonable request.
Thackshanaramana Balashanmugam (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: