What is the clinical evidence from this study?

Study design: Other. Population: Emotion Recognition (n=40). Intervention: Pre-trained Transformer-based model vs. Pre-trained 1D-CNN. Primary outcome: Arousal Accuracy (p=<0.01).

August 21, 2022Open Access

Transformer-Based Self-Supervised Learning for Emotion Recognition

Q: What are the key findings of this study?

A pre-trained Transformer-based model achieved an average accuracy of 0.88 for arousal recognition from ECG signals, significantly outperforming a pre-trained CNN approach.

Q: What does this research mean for the field?

A pre-trained Transformer-based model using self-supervised learning achieves state-of-the-art performance for emotion recognition from ECG signals, significantly outperforming pre-trained CNN approaches. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

Key Result

A pre-trained Transformer-based model achieved an average accuracy of 0.88 for arousal recognition from ECG signals, significantly outperforming a pre-trained CNN approach.

Structured PICO

Population

ECG datasets including AMIGOS (n=40 subjects) for fine-tuning, and ASCERTAIN, DREAMER, PsPM-FR, PsPM-HRM5, PsPM-RRM1-2, and PsPM-VIS for pre-training

Intervention

Transformer-based self-supervised learning model for processing ECG signals

Comparator

Models without pre-training and other state-of-the-art machine learning approaches (e.g., pre-trained CNN)

Outcome

Mean accuracy and mean F1-score for arousal and valence prediction

A Transformer-based self-supervised learning approach effectively extracts contextualized representations from ECG signals, achieving state-of-the-art performance in emotion recognition.

Main Result

Absolute Event Rate: 0.88% vs 0.85%

p-value: p=<0.01

Limitations

Limited availability of labeled training data for physiological signals
Longer signal segmentation might cover fluctuating emotional states, making it harder to characterize emotion
Longer segments require more complex models which are harder to train with restricted amounts of labeled data

Abstract

In order to exploit representations of time-series signals, such as physiological signals, it is essential that these representations capture relevant information from the whole signal. In this work, we propose to use a Transformer-based model to process electrocardiograms (ECG) for emotion recognition. Attention mechanisms of the Transformer can be used to build contextualized representations for a signal, giving more importance to relevant parts. These representations may then be processed with a fully-connected network to predict emotions.To overcome the relatively small size of datasets with emotional labels, we employ self-supervised learning. We gathered several ECG datasets with no labels of emotion to pre-train our model, which we then fine-tuned for emotion recognition on the AMIGOS dataset. We show that our approach reaches state-of-the-art performances for emotion recognition using ECG signals on AMIGOS. More generally, our experiments show that transformers and pre-training are promising strategies for emotion recognition with physiological signals.

Bookmark

View Full Paper