Key points are not available for this paper at this time.
Human emotion is a concrete form of human communication, and the research on emotion recognition is increasing gradually. In recent years, researchers have paid more attention to multi-modal emotion recognition. This paper presents a deep neural network for emotion recognition based on speech spectrum. Spectrograms contain comprehensive information about speech and are useful for emotion recognition. We tried the convolutional neural network (CNN) and the Long-Short Term Memory (LSTM), the combination of voice to make use of CNN feature extraction, using LSTM network reserve the temporal information, the voice information extracted from spectrogram, and the emotion recognition task. This study adopts the university of southern California’s Interactive Emotion Capture (IEMOCAP) dataset as the data collection. We use the speech spectrogram as input, for six kind of mood, and the final weighted accuracy is 61%, the unweighted accuracy is 56%.
Li et al. (Fri,) studied this question.