Key points are not available for this paper at this time.
Current approaches to the recognition of emotion within speech usually use statistic feature information obtained by application of functionals on turn- or chunk levels. Yet, it is well known that thereby important information on temporal sub-layers as the frame-level is lost. We therefore investigate the benefits of integration of such information within turn-level feature space. For frame-level analysis we use GMM for classification and 39 MFCC and energy features with CMS. In a subsequent step output scores are fed forward into a 1.4k large-feature-space turn-level SVM emotion recognition engine. Thereby we use a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects. Extensive testruns are carried out on the public databases EMO-DB and SUSAS. Speaker-independent analysis is faced by speaker normalization. Overall results highly emphasize the benefits of feature integration on diverse time scales.
Vlasenko et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: