Key points are not available for this paper at this time.
For emotion recognition, we selected pitch, log energy, formant, mel-band energies, and mel frequency cepstral co-efficients (MFCCs) as the base features, and added veloc-ity/acceleration of pitch and MFCCs to form feature streams. We extracted statistics used for discriminative classifiers, as-suming that each stream is a one-dimensional signal. Extracted features were analyzed by using quadratic discriminant analy-sis (QDA) and support vector machine (SVM). Experimental results showed that pitch and energy were the most important factors. Using two different kinds of databases, we compared emotion recognition performance of various classifiers: SVM, linear discriminant analysis (LDA), QDA and hidden Markov model (HMM). With the text-independent SUSAS database, we achieved the best accuracy of 96.3 % for stressed/neutral style classification and 70.1 % for 4-class speaking style clas-sification using Gaussian SVM, which is superior to the previ-ous results. With the speaker-independent AIBO database, we achieved 42.3 % accuracy for 5-class emotion recognition. 1.
Kwon et al. (Mon,) studied this question.