September 19, 2024Open Access

EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts

Q: What is the clinical evidence from this study?

Study design: Other. Population: Emotion Recognition (n=42). Intervention: Cue-based conversation scenario eliciting five distinct emotions. Primary outcome: Baseline performance of emotion recognition for each modality.

Key Result

The EAV dataset provides multimodal EEG, audio, and video recordings from 42 participants engaged in cue-based conversations, achieving baseline emotion recognition accuracies of 60.0% for EEG, 61.9% for audio, and 71.4% for video.

Structured PICO

Population

42 participants engaging in a cue-based conversation scenario eliciting five distinct emotions (neutral, anger, happiness, sadness, and calmness), contributing 200 interactions each.

Intervention

Multimodal emotion dataset collection (30-channel EEG, audio, and video recordings)

Outcome

Baseline performance of emotion recognition for each modality using established deep neural network (DNN) methods

The EAV dataset provides a novel multimodal resource incorporating EEG, audio, and video for modeling human emotional processes in conversational contexts.

Limitations

Conversational scenario is designed using cue-based and posed conditions, which may not fully replicate naturalistic settings.

Abstract

Understanding emotional states is pivotal for the development of next-generation human-machine interfaces. Human behaviors in social interactions have resulted in psycho-physiological processes influenced by perceptual inputs. Therefore, efforts to comprehend brain functions and human behavior could potentially catalyze the development of AI models with human-like attributes. In this study, we introduce a multimodal emotion dataset comprising data from 30-channel electroencephalography (EEG), audio, and video recordings from 42 participants. Each participant engaged in a cue-based conversation scenario, eliciting five distinct emotions: neutral, anger, happiness, sadness, and calmness. Throughout the experiment, each participant contributed 200 interactions, which encompassed both listening and speaking. This resulted in a cumulative total of 8,400 interactions across all participants. We evaluated the baseline performance of emotion recognition for each modality using established deep neural network (DNN) methods. The Emotion in EEG-Audio-Visual (EAV) dataset represents the first public dataset to incorporate three primary modalities for emotion recognition within a conversational context. We anticipate that this dataset will make significant contributions to the modeling of the human emotional process, encompassing both fundamental neuroscience and machine learning viewpoints.

Bookmark

View Full Paper

Cite This Study

Lee et al. (Thu,) conducted a other in Emotion Recognition (n=42). Cue-based conversation scenario eliciting five distinct emotions was evaluated on Baseline performance of emotion recognition for each modality. The EAV dataset provides multimodal EEG, audio, and video recordings from 42 participants engaged in cue-based conversations, achieving baseline emotion recognition accuracies of 60.0% for EEG, 61.9% for audio, and 71.4% for video.

synapsesocial.com/papers/6a0a3124b0d552aa8b461070 https://doi.org/https://doi.org/10.1038/s41597-024-03838-4

Bookmark

View Full Paper