There are many pre-trained AI models currently in use because of their high performance and quick responses on important tasks. With new laws and legislature passing that is aware and critical of these methods we now require explanations from these models. However, most of the original training data are no longer available and it's impractical to expend resources to train a new explainable model. An alternative is to produce explanations of these pre-trained models, known as post-hoc explanations. The most popular of these methods, LIME, has seen a great deal of employment on this problem, but it has unaddressed issues with performance. Something we presume is caused by the quality of the local data generated to train it, known as perturbations. To fix this issue and clarify the goals of post-hoc explanations we propose using a distribution better follows patterns in the data when generating perturbed samples. A distribution is used to add noise or create more local samples, producing perturbations. Originally this is the normal distribution but there are many others. We evaluate each approach by estimating fidelity on real local-data using the nearest neighbors to the explained sample. We find that when there are correlations between features, using multivariate perturbations greatly improves generalizability, more so than other perturbation approaches. This works particularly well for complex post-hoc explainers, and when there is little to gain from interactions there is no noticeable decrease to performance to worry about.
Smith et al. (Mon,) studied this question.