Key points are not available for this paper at this time.
Abstract As one of the important features that differentiates humans from machines, emotion is complicated not only in its wide varieties but also in its expression channels, including both verbal and non-verbal language. Different modalities contribute in unique ways to the integrated expression of emotion. However, in most of the existing multimodal datasets, there is only one unified emotion label for the various modalities, ignoring the heterogeneity and complementarity of the different modalities. For instance, the text ''I love the test" may be labelled as love or joy, but if it is expressed with a low dejected tone and a mournful expression, the overall emotion might be more of sadness or even disgust. To bridge this gap, we present in this paper UniC, a novel multimodal emotion dataset featuring both integrated multimodal labels and independent unimodal labels. UniC is comprised of 965 emotion-rich video clips selected from YouTube, annotated in text, audio, (silent) video and multimodal setups with both categorical and dimensional labels. We present the dataset construction steps and an analysis of different modality perspectives based on UniC. It is found that although in most cases the modality of text shares more emotional resemblance with the multimodal setup, other modalities may have different and even opposite emotions which might contribute more to the overall emotion states. This dataset contributes a modality-specific perspective to multimodal emotion analysis, and has the potential to offer more insights for further research in human-machine interaction and emotion modelling for robots.
Du et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: