August 20, 2024Open Access

Interpretability in Video-based Human Action Recognition: Saliency Maps and GradCAM in 3D Convolutional Neural Networks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Interpretability plays a vital role in understanding complex deep learning models by providing transparency and insights. It addresses the black-box nature of these models, aids in human-in-the-loop systems, enhances model development, and supports education. However, existing interpretability algorithms in computer vision primarily target images, leaving a gap for video applications. In this article, we emphasize the importance of interpretability in video-based Human Action Recognition (HAR). We extend existing 2D interpretability techniques to the 3D domain, specifically focusing on saliency maps and gradient-weighted class activation maps. The proposed interpretability system is then employed in the analysis of a well-known HAR dataset to better understand action recognition in videos.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Fernandez et al. (Tue,) studied this question.

synapsesocial.com/papers/68e5b8abb6db64358755187b https://doi.org/https://doi.org/10.5617/nmi.10587