Key points are not available for this paper at this time.
Class based emotion recognition from speech, as performed in most works up to now, entails many restrictions for practical applications. Human emotion is a continuum and an automatic emotion recognition system must be able to recognise it as such. We present a novel approach for continuous emotion recognition based on Long Short-Term Memory Recurrent Neural Networks which include modelling of long-range dependencies between observations and thus outperform techniques like Support-Vector Regression. Transferring the innovative concept of additionally modelling emotional history to the classification of discrete levels for the emotional dimensions “valence ” and “activation ” we also apply Conditional Random Fields which prevail over the commonly used Support-Vector Machines. Experiments conducted on data that was recorded while humans interacted with a Sensitive Artificial Listener prove that for activation the derived classifiers perform as well as human annotators.
Wöllmer et al. (Mon,) studied this question.