Key points are not available for this paper at this time.
The interpretation of deep learning models is a challenge due to their size, , and often opaque internal state. In addition, many systems, such as classifiers, operate on low-level features rather than high-level. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in of human-friendly concepts. The key idea is to view the high-dimensional state of a neural net as an aid, not an obstacle. We show how to use as part of a technique, Testing with CAVs (TCAV), that uses directional to quantify the degree to which a user-defined concept is important a classification result--for example, how sensitive a prediction of "zebra" to the presence of stripes. Using the domain of image classification as a ground, we describe how CAVs may be used to explore hypotheses and insights for a standard image classification network as well as a application.
Kim et al. (Thu,) studied this question.