A pre-trained Transformer-based model achieved an average accuracy of 0.88 for arousal recognition from ECG signals, significantly outperforming a pre-trained CNN approach.
A Transformer-based self-supervised learning approach effectively extracts contextualized representations from ECG signals, achieving state-of-the-art performance in emotion recognition.
Absolute Event Rate: 0.88% vs 0.85%
p-value: p=<0.01
In order to exploit representations of time-series signals, such as physiological signals, it is essential that these representations capture relevant information from the whole signal. In this work, we propose to use a Transformer-based model to process electrocardiograms (ECG) for emotion recognition. Attention mechanisms of the Transformer can be used to build contextualized representations for a signal, giving more importance to relevant parts. These representations may then be processed with a fully-connected network to predict emotions.To overcome the relatively small size of datasets with emotional labels, we employ self-supervised learning. We gathered several ECG datasets with no labels of emotion to pre-train our model, which we then fine-tuned for emotion recognition on the AMIGOS dataset. We show that our approach reaches state-of-the-art performances for emotion recognition using ECG signals on AMIGOS. More generally, our experiments show that transformers and pre-training are promising strategies for emotion recognition with physiological signals.
Vazquez-Rodriguez et al. (Sun,) conducted a other in Emotion Recognition (n=40). Pre-trained Transformer-based model vs. Pre-trained 1D-CNN was evaluated on Arousal Accuracy (p=<0.01). A pre-trained Transformer-based model achieved an average accuracy of 0.88 for arousal recognition from ECG signals, significantly outperforming a pre-trained CNN approach.