April 26, 2024

Sequence Modeling and Feature Fusion for Multimodal Emotion Recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Multimodal emotion recognition (MER) refers to the recognition and understanding of emotional states from multiple types of data (e.g., audio, and video, etc.). MER in applications such as human-computer interaction, virtual assistants, and social media, an accurate understanding of user emotions can lead to smarter and more personalized services. For example, virtual assistants can better understand the emotional needs of users and provide responses that are more in line with user expectations. However, the existing multimodal emotion recognition research based on LSTM method has poor modeling ability for speaker context dependence, interlocutor temporal order, and dialogue context. This paper presents a novel approach for detecting multi-modal emotion features using Transformers, specifically designed for tracking emotions in dialogue. Furthermore, the multi-attention mechanism is used to emphasize the effective information, and the context global information of the data is fully paid attention to form the semantic visual related features of multi-level attention mechanism fusion. Finally, the three modalities are effectively fused at the decision level to achieve emotion recognition with high accuracy and strong generalization ability, and improve the model's ability to understand complex scenes. On the IEMOCAP dataset, our proposed method achieves the best emotion recognition results among the comparison methods.

Perguntar à IA

Bookmark

Perguntar à IA

Bookmark

Sequence Modeling and Feature Fusion for Multimodal Emotion Recognition

Key Points

Abstract

Cite This Study