September 24, 2024Open Access

MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Speech emotion recognition (SER) is challenging owing to the complexity of emotional representation. Hence, this article focuses on multimodal speech emotion recognition that analyzes the speaker’s sentiment state via audio signals and textual content. Existing multimodal approaches utilize sequential networks to capture the temporal dependency in various feature sequences, ignoring the underlying relations in acoustic and textual modalities. Moreover, current feature-level and decision-level fusion methods have unresolved limitations. Therefore, this paper develops a novel multimodal fusion graph convolutional network that comprehensively executes information interactions within and between the two modalities. Specifically, we construct the intra-modal relations to excavate exclusive intrinsic characteristics in each modality. For the inter-modal fusion, a multi-perspective fusion mechanism is devised to integrate the complementary information between the two modalities. Substantial experiments on the IEMOCAP and RAVDESS datasets and experimental results demonstrate that our approach achieves superior performance. • Develop a multimodal fusion graph convolutional network to execute the intra- and inter-modal interactions. • Excavate the sentiment, semantic, and temporal dependency to construct the intra-modal relations. • Devise a multi-perspective fusion mechanism for inter-modal fusion. • Adopt a multi-angle loss to optimize the model.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Qi et al. (Tue,) studied this question.

synapsesocial.com/papers/69ff4aab4716aad0cc85479e — DOI: https://doi.org/10.1016/j.neucom.2024.128646

Authors

Xin Qi

University of Technology Sydney

Yujun Wen

Communication University of China

Pengzhou Zhang

Communication University of China

Journals

Neurocomputing

Actions

Institutions

Beijing Institute of Technology

Communication University of China

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion