August 16, 2025

Cross-Lingual Attention-based Mechanism for Speech Emotion Recognition

Key Points

CLAF-SER framework achieved superior performance when analyzing combined datasets, addressing multiple emotions effectively.
Experimentation involved datasets like RAVDESS, TESS, and EMO-DB to evaluate model accuracy and emotion classification.
Recurrent Neural Networks (RNN) utilized in the framework capitalize on features like MFCC and pitch to classify emotions accurately.
Findings suggest that cross-lingual attention mechanisms enhance emotional speech detection across diverse languages.

Abstract

Speech emotion recognition is one of the most emerging areas for emotion detection that may fall within the scope of affective computing. In this particular case, emotional speech files of spoken words delivered during verbal communication are of interest. The emotions of speech are investigated through sound and emotion in speech and are modeled through machine learning. Through machine learning, we performed a series of experiments on datasets like RAVDESS, TESS, SAVEE, and EMO-DB, which lean toward the objective that a Recurrent Neural Network (RNN) and (CLAF-SER): The Cross-Lingual Attention-Based Adversarial Framework for SER would be able to detect and classify such emotions as sadness, anger, happiness, neutrality, and fear. Features such as MFCC, LPCC, pitch, energy, and chroma were extracted before implementing the RNN. Through this model, TESS achieved the highest accuracy among the other datasets. However, CLAF-SER gives the best performance when all datasets are combined.

KI fragen

Bookmark