ABSTRACT Emotion detection from face and speech is inherent for human–computer interaction, mental health assessment, social robotics, and emotional intelligence. Traditional machine learning methods typically depend on handcrafted features and are primarily centred on unimodal systems. However, the unique characteristics of facial expressions and the variability in speech features present challenges in capturing complex emotional states. Accordingly, deep learning models have been substantial in automatically extracting intrinsic emotional features with greater accuracy across multiple modalities. The proposed article presents a comprehensive review of recent progress in emotion detection, spanning from unimodal to multimodal systems, with a focus on facial and speech modalities. It examines state‐of‐the‐art machine learning, deep learning, and the latest transformer‐based approaches for emotion detection. The review aims to provide an in‐depth analysis of both unimodal and multimodal emotion detection techniques, highlighting their limitations, popular datasets, challenges, and the best‐performing models. Such analysis aids researchers in judicious selection of the most appropriate dataset and audio‐visual emotion detection models. Key findings suggest that integrating multimodal data significantly improves emotion recognition, particularly when utilising deep learning methods trained on synchronised audio and video datasets. By assessing recent advancements and current challenges, this article serves as a fundamental resource for researchers and practitioners in the field of emotional AI, thereby aiding in the creation of more intuitive and empathetic technologies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Priyanka Thakur
Nirmal Kaur
Naveen Aggarwal
Expert Systems
Panjab University
Building similarity graph...
Analyzing shared references across papers
Loading...
Thakur et al. (Mon,) studied this question.
www.synapsesocial.com/papers/689a0614e6551bb0af8cd5ed — DOI: https://doi.org/10.1111/exsy.70103
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: