Key points are not available for this paper at this time.
Speech emotion recognition is a challenging problem partly because it is unclear what features are effective for the task. In this paper we propose to utilize deep neural networks (DNNs) to extract high level features from raw data and show that they are effective for speech emotion recognition. We first produce an emotion state probability distribution for each speech segment using DNNs. We then construct utterance-level features from segment-level probability distributions. These utterancelevel features are then fed into an extreme learning machine (ELM), a special simple and efficient single-hidden-layer neural network, to identify utterance-level emotions. The experimental results demonstrate that the proposed approach effectively learns emotional information from low-level features and leads to 20% relative accuracy improvement compared to the stateof-the-art approaches.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kun Han
National University of Defense Technology
Dong Yu
Seattle University
Ivan Tashev
Microsoft (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Han et al. (Sun,) studied this question.
synapsesocial.com/papers/6a126168965b75813866e873 — DOI: https://doi.org/10.21437/interspeech.2014-57