September 26, 2010Open Access

Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling

Key Points

Key points are not available for this paper at this time.

Abstract

In this paper, we apply a context-sensitive technique for multimodal emotion recognition based on feature-level fusion of acoustic and visual cues. We use bidirectional Long Short-Term Memory (BLSTM) networks which, unlike most other emotion recognition approaches, exploit long-range contextual information for modeling the evolution of emotion within a conversation. We focus on recognizing dimensional emotional labels, which enables us to classify both prototypical and nonprototypical emotional expressions contained in a large audiovisual database. Subject-independent experiments on various classification tasks reveal that the BLSTM network approach generally prevails over standard classification techniques such as Hidden Markov Models or Support Vector Machines, and achieves F1-measures of the order of 72 %, 65 %, and 55 % for the discrimination of three clusters in emotional space and the distinction between three levels of valence and activation, respectively. Index Terms: emotion recognition, multimodality, long shortterm memory, hidden markov models, context modeling

Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling

Key Points

Abstract

Cite This Study

Also Consider

Also Consider