What question did this study set out to answer?

The central aim is to explore disagreements in feature attributions among different models even when their predictions align.

February 20, 2026Open Access

The Explanation Lottery: Cross-Model Feature Attribution Disagreement Despite Prediction Consensus

Key Points

The central aim is to explore disagreements in feature attributions among different models even when their predictions align.
Analyzed 24 datasets across multiple model classes
Assessed feature attributions despite identical predictions
Developed a reliability score to measure explanation stability
Performed extensive statistical analysis to validate findings
Found significant variability in feature attributions across models with the same predictions
Demonstrated that explanation disagreements occur even in accurate predictive models
Established a reliability score that quantifies the stability of explanations

Abstract

This work investigates cross-model explanation disagreement conditional on prediction agreement, termed the “Explanation Lottery.” We conduct a large-scale empirical study across 24 datasets and multiple model classes, demonstrating substantial variability in feature attributions despite identical predictions. We propose a reliability score to quantify explanation stability and validate findings through extensive statistical analysis. This manuscript is a preprint of research currently under review at Transactions on Machine Learning Research (TMLR). Full experimental pipeline and implementation details are available from the authors upon reasonable request.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper