A Deep Bidirectional Long Short-Term Memory Recurrent Neural Network based multimodal affect prediction framework achieved a concordance correlation coefficient of 0.747 for arousal and 0.609 for valence on the test set.
A DBLSTM-RNN based multimodal framework using audio, video, and physiological features achieved promising results for predicting affective dimensions.
This paper presents our system design for the Audio-Visual Emotion Challenge (AV^+EC 2015). Besides the baseline features, we extract from audio the functionals on low-level descriptors (LLDs) obtained via the YAAFE toolbox, and from video the Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP) features. From the physiological signals, we extract 52 electro-cardiogram (ECG) features and 22 electro-dermal activity (EDA) features from various analysis domains. The extracted features along with the AV^+EC 2015 baseline features of audio, ECG or EDA are concatenated for a further feature selection step, in which the concordance correlation coefficient (CCC), instead of the usual Pearson correlation coefficient (CC), has been used as objective function. In addition, offsets between the features and the arousal/valence labels are considered in both feature selection and modeling of the affective dimensions. For the fusion of multimodal features, we propose a Deep Bidirectional Long Short-Term Memory Recurrent Neural Network (DBLSTM-RNN) based multimodal affect prediction framework, in which the initial predictions from the single modalities via the DBLSTM-RNNs are firstly smoothed with Gaussian smoothing, then input into a second layer of DBLSTM-RNN for the final prediction of affective state. Experimental results show that our proposed features and the DBLSTM-RNN based fusion framework obtain very promising results. On the development set, the obtained CCC is up to 0. 824 for arousal and 0. 688 for valence, and on the test set, the CCC is 0. 747 for arousal and 0. 609 for valence.
He et al. (Tue,) conducted a other in Affective dimension prediction. Deep Bidirectional Long Short-Term Memory Recurrent Neural Network (DBLSTM-RNN) based multimodal affect prediction framework vs. Baseline features was evaluated on Concordance correlation coefficient (CCC) for arousal and valence. A Deep Bidirectional Long Short-Term Memory Recurrent Neural Network based multimodal affect prediction framework achieved a concordance correlation coefficient of 0.747 for arousal and 0.609 for valence on the test set.