Key points are not available for this paper at this time.
Abstract Recent advances in convolutional neural network (CNN) interpretability have led to a wide-variety of gradient-based visual attention techniques for generating visual attention maps. However, most of these methods require a classification-type design architecture, and consequently concentrate on classification/categorization-type tasks. Extending these methods to generate visual attention maps for other kinds of computer vision models, e.g., variational autoencoders (VAE) is not trivial. In this paper, we present a method that helps bridge this crucial gap, proposing to compute VAE attention as a means for interpreting the latent space learned by a VAE. We first present methods to generate visual attention maps from the learned latent space, and then show how they can be used in a variety of applications: localizing anomalies in images, including medical imagery, and improved latent space disentanglement. We conduct extensive experiments on a wide-variety of benchmark datasets to demonstrate the efficacy of the proposed VAE attention.
Li et al. (Fri,) studied this question.