What question did this study set out to answer?

The study investigates how well different neural network models can recognize emotions from speech.

February 24, 2026Open Access

Comparative Analysis of Neural Networks for Speech Emotion Recognition

Key Points

The study investigates how well different neural network models can recognize emotions from speech.
Simulated multiple neural network models: RBFN, MLP, PNN
Used speech data from the Berlin EMO-DB database
Extracted features using LPCC, MFCC, PLP, and VQ
Tested models for their accuracy in emotion classification
Observed 83% accuracy with PNN using LPCCVQ features
Achieved 82% accuracy with RBFN
Recorded 78% accuracy with MLP
LPCCVQ features were found to be the most reliable for emotion characterization

Abstract

This paper aims to investigate the ability of Neural Network (NN) models in recognizing speech emotions. Extensive simulation of NN models such as the Radial Basis Function Network (RBFN), the Multilayer Perceptron (MLP), and the Probabilistic Neural Network (PNN) has been carried out to determine the Speech Emotion Recognition (SER) Accuracy of emotional states such as anger, happiness, sadness, and boredom. The utterances for these states are chosen from the standard Berlin (EMO-DB) database. The efficient Cepstral domain vocal tract system features such as the Linear Predictive Cepstral Coefficients (LPCC), Mel Frequency Cepstral Coefficients (MFCCs), the Perceptual Linear Prediction coefficients (PLP) are put to test for their emotional discriminating ability with the proposed setup. These features are extracted at a frame-level and are clustered into their corresponding Vector Quantized (VQ) coefficients to get rid of the redundant information before simulating the chosen classifiers. The NN based identification system models are experienced with the desired level of SER accuracy as these classifiers remain effective for low-dimensional feature sets. An improved accuracy of 83% has been observed with the PNN using the LPCCVQ feature sets as compared to 82% with the RBFN and 78% with the MLP. Amongst the derived feature sets, the LPCCVQ remains more reliable in characterizing the intended speech emotions while the PNN has outperformed other NN classifiers in the classification category as revealed from our results.

Bookmark

View Full Paper

Cite This Study

Palo et al. (Thu,) studied this question.

synapsesocial.com/papers/699d401ade8e28729cf65240 https://doi.org/https://doi.org/10.14419/ijet.v7i4.39.23820

Bookmark

View Full Paper