October 20, 2017

Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The continuous dimensional emotion can depict subtlety and complexity of emotional change, which is an inherently challenging problem with growing attention. This paper presents our automatic prediction of dimensional emotional state for Audio-Visual Emotion Challenge (AVEC 2017), which uses multi-features and fusion across all available modalities. Besides the baseline features provided by the organizers, we also extract other acoustic audio feature sets, appearance features and deep visual features as complementary features. Each type of feature is trained using Long Short-Term Memory Recurrent Neutral Network (LSTM-RNN) for every dimensional emotion prediction separately considering annotation delay and temporal pooling. To overcome overfitting problem, robust models are chosen carefully for individual model. Finally, multimodal emotion fusion is achieved by utilizing Support Vector Regression (SVR) with the estimates from different feature sets in decision level fusion. The experimental results indicate that our extracted features are beneficial to performance improvement and our system design achieves very promising results with Concordant Correlation Coefficient (CCC), which outperform the baseline system on the testing set for arousal of 0.599 vs 0.375 (baseline) and for valence of 0.721 vs 0.466 and for liking 0.295 vs 0.246.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Huang et al. (Fri,) studied this question.

synapsesocial.com/papers/6a0ea0ea06ecbe833447a1c8 https://doi.org/https://doi.org/10.1145/3133944.3133946

Preguntar a la IA

Me gusta

Guardar