Emotion identification via computer vision has made continuous progress over the last few years. Although images have been the gold standard for the past two decades, video is increasingly common. Video is particularly suitable for the study of emotions, as it allows them to be considered as spatiotemporal phenomena. In particular, the discovery of anxiety among Mexican students is a key element for improving their learning in the classroom. In pursuit of this goal, we focused on the following challenges. First, the scarcity of specialized datasets for this task prompted us to develop an experimental protocol to generate a specific dataset; second, to conduct a thorough study of the appropriate number of emotional intensity levels; and third, to develop a suitable design for a deep learning architecture. Our pivotal results include the development of a new dataset labeled with three different emotion levels and appropriate ConvNet architectures, complemented by a study of various intensity levels. The optimal architecture achieved an F1-score of 0.7620 across five intensity levels and provides an adequate baseline for multiclass classification.
Moreno-Armendáriz et al. (Fri,) studied this question.