A mental state, emotion is connected to human behaviour, thoughts, and the degree of positive or negative experiences. Human emotion does not yet have a precise definition. By allowing AI systems to precisely comprehend and sympathetically react to human emotions, this discovery has the potential to completely transform human-machine interaction and open the door for increasingly sophisticated and emotionally intelligent computers. The main research problem is creating models that accurately read emotions from multimodal data; this calls for big, diverse datasets for video data to capture complex emotional cues and fine-tuned CNNs for audio data to identify minor speech changes. This study introduces a novel multimodal emotion detection method that seamlessly combines voice and video modalities to correctly infer emotional states. The attention-based CNN-Bi-LSTM model handles the video component and provides deep semantic understanding through its bidirectional layers. An attention-based fusion process is used to blend the results of both modalities, balancing their respective contributions. Here, the suggested methodology is thoroughly tested using two different datasets: the YouTube and Carnegie Mellon University SAVEE datasets.The results show higher efficacy compared to current frameworks. This comprehensive technology enables accurate emotion recognition and contributes to a number of noteworthy developments in the industry.
Building similarity graph...
Analyzing shared references across papers
Loading...
J. Biju
K. Lavanya
J. Raja
SHILAP Revista de lepidopterología
SRM Institute of Science and Technology
Karunya University
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Biju et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69f837ab3ed186a739981ce7 — DOI: https://doi.org/10.5935/jetia.v12i58.2986