This paper aims to investigate the ability of Neural Network (NN) models in recognizing speech emotions. Extensive simulation of NN models such as the Radial Basis Function Network (RBFN), the Multilayer Perceptron (MLP), and the Probabilistic Neural Network (PNN) has been carried out to determine the Speech Emotion Recognition (SER) Accuracy of emotional states such as anger, happiness, sadness, and boredom. The utterances for these states are chosen from the standard Berlin (EMO-DB) database. The efficient Cepstral domain vocal tract system features such as the Linear Predictive Cepstral Coefficients (LPCC), Mel Frequency Cepstral Coefficients (MFCCs), the Perceptual Linear Prediction coefficients (PLP) are put to test for their emotional discriminating ability with the proposed setup. These features are extracted at a frame-level and are clustered into their corresponding Vector Quantized (VQ) coefficients to get rid of the redundant information before simulating the chosen classifiers. The NN based identification system models are experienced with the desired level of SER accuracy as these classifiers remain effective for low-dimensional feature sets. An improved accuracy of 83% has been observed with the PNN using the LPCCVQ feature sets as compared to 82% with the RBFN and 78% with the MLP. Amongst the derived feature sets, the LPCCVQ remains more reliable in characterizing the intended speech emotions while the PNN has outperformed other NN classifiers in the classification category as revealed from our results.
Palo et al. (Thu,) studied this question.