September 18, 2024Open Access

Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Abstract In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Salas-Cáceres et al. (Wed,) studied this question.

synapsesocial.com/papers/68e581eab6db64358751f3cb https://doi.org/https://doi.org/10.1007/s11042-024-20227-6

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo