For the interpretability of deep neural networks (DNNs) in visual-related tasks, existing explanation methods commonly generate a saliency map based on the linear relation between output results and input features. However, when the explanation conflicts with a human visual examination, these methods do not provide further evidence to analyze the saliency explanation. Most may fail to provide feature attribution with identifiable semantics or produce misleading explanations due to their insufficient robustness. In this paper, we first propose four key characteristics (richness, adaptivity, exclusiveness, and fairness) to evaluate the existing linear relation-based explanation method, and then construct an interpretable linear model to satisfy them. We formalize the characteristics and develop a novel explanation method based on this. We extract and reconstruct key exclusive semantic features from the feature map using the Nonnegative Matrix Factorization (NMF) algorithm, utilize the information entropy model to determine the number of features adaptively and their richness, and then linearly combine each feature with fairly assigned weights using an approximate Shapley algorithm to generate the saliency map. Compared with the state-of-the-art methods, our explanations of different datasets and DNNs are more convincing and robust in terms of Average drop (AD), Average increase (AI), Deletions (Del), and Insertions (Ins). Our supplementary experiments provide sufficient evidence that the four characteristics guarantee the feasibility of feature attribution analysis and enhance the quality of the resulting explanations.
Yuan et al. (Wed,) studied this question.