Key points are not available for this paper at this time.
Research into emotion detection is crucial because of the wide range of fields that can benefit from it, including healthcare, intelligent customer service, and education. In comparison to unimodal approaches, multimodal emotion recognition (MER) integrates many modalities including text, facial expressions, and voice to provide better accuracy and robustness. This article provides a historical and present-day overview of MER, focusing on its relevance, difficulties, and approaches. We examine several datasets, comparing and contrasting their features and shortcomings; they include IEMOCAP and MELD. Recent developments in deep learning approaches, particularly fusion strategies such as early, late, and hybrid fusion are covered in the literature review. Data redundancy, complicated feature extraction, and real-time detection are among the identified shortcomings. Our suggested technique enhances emotion recognition accuracy by using deep learning to extract features using a hybrid fusion approach. To overcome existing restrictions and advance the area of MER, this study intends to direct future investigations in the right direction. Examining various data fusion strategies, reviewing new methodologies in multimodal emotion identification, and identifying problems and research needs to make up the primary body of this work.
Sanku et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: