Key points are not available for this paper at this time.
Spatio-temporal graph convolutional networks (STGCN) have become popular recently because they can handle structured data with dynamic temporal variations. However, the lack of interpretability limits the potential application of STGCNs. Gradient-based class activation maps (Grad-CAM) are a popular technique to interpret convolutional neural networks for grid structured data such as images. In this paper, we design an extension of Grad-CAMs for spatio temporal graph convolution (STG-Grad-CAM) to improve the interpretability of STGCNs. As a proof of concept we provide results for a skeleton-based activity recognition task. We show which body joints are responsible for a particular task and how their temporal dynamics contribute to the classification output. We present a brief study of the interpretability of a recognition task by changing the model depth and the training and testing protocol. To find the efficacy of STG-Grad-CAM, we compute faithfulness of STG-Grad-CAM to the model measured by the impact of occlusions to the graph nodes. For explainability of STGCN, we compute contrastivity of the model for different classes based on the outcome of STG-Grad-CAM. In the cross-person setting, we observe better contrastivity than the cross-view setting.
Das et al. (Wed,) studied this question.
Synapse has enriched 2 closely related papers on similar clinical questions. Consider them for comparative context: