Explaining the decisions of learned models in human-interpretable language is critical for building trustworthy AI. Much of the current work on explainable AI in image classification focuses on generating local (image-specific) explanations that highlight parts of the image responsible for decisions. Although valuable, these local explanations are often not human-interpretable, lack generalizability across entire classes or datasets, and fail to cater to a diverse set of stakeholders. In this work, we introduce a novel framework for generating global explanations in terms of human-aligned concepts that are applicable to any image-based classifier, irrespective of the architecture. Our methods provide both local and global explanations across multiple images from multiple classes. We present our framework for generating global explanations along with experimental results on multiple datasets that demonstrate the effectiveness of our technique. Our method achieves a test coverage of 99.3% on the Stanford Cars dataset.
Vasu et al. (Thu,) studied this question.