What question did this study set out to answer?

The research aims to investigate how class label noise impacts the interpretability of explainable AI models.

March 13, 2026Open Access

Interpretation drift in explainable AI under label noise

Key Points

The research aims to investigate how class label noise impacts the interpretability of explainable AI models.
Examined rule-learning models for interpretability under varying levels of label noise
Analyzed the stability of model performance against changes in interpretability
Conducted empirical evaluations to reveal the impact of label noise on explanation consistency
Model performance remains stable despite increasing label noise
Interpretability of model rules significantly declines with higher label noise
Interpretation drift is observed, with substantial changes in explanations despite stable predictions

Abstract

The comprehensibility and human interpretation of classification models are crucial in many applications, such as decision support systems and knowledge discovery, where explanations drive action. However, the presence of class label noise, widespread in real-life data, can significantly impact the performance and interpretability of data models. This study addresses the problem of interpretability robustness by examining the impact of class label noise on rule-learning models – the models extensively used for discovering transparent, human-readable interpretations of hidden data patterns and decision logic. Our empirical results demonstrate that while model performance may remain stable under increasing label noise, the consistency of explainable model rules suffers significantly. As a result, we uncover a novel and critical phenomenon – interpretation drift – where model explanations change substantially under label noise, even when predictive performance remains stable. This phenomenon can directly impact AI-informed decisions, but is not detectable through conventional performance metrics and therefore poses significant risks in real-world applications reliant on AI explanations. Our findings emphasize the need for standardized, interpretability-aware robustness metrics in the development of trustworthy explainable AI.

Bookmark

View Full Paper

Bookmark

View Full Paper

Interpretation drift in explainable AI under label noise

Key Points

Abstract

Cite This Study