What question did this study set out to answer?

This research investigates the impact of multimodal emotional conflict on emotion recognition during video viewing.

April 22, 2026Open Access

Emotional Recognition Under Multimodal Conflict: A Gaze-Based Response Task

Key Points

This research investigates the impact of multimodal emotional conflict on emotion recognition during video viewing.
Forty-seven undergraduate students participated in a gaze-based response task.
Participants judged the overall emotion conveyed after viewing videos with congruent and incongruent emotional cues.
Data analysis utilized repeated-measures ANOVAs and generalized linear mixed-effects models.
Participants demonstrated higher accuracy for congruent compared to incongruent stimuli across all emotional domains.
The semantic content showed the greatest performance reduction, followed by facial expression and vocal prosody.
Significant interaction effects were found, indicating domain-specific impacts on emotional recognition.

Abstract

Background: Emotional recognition relies on the integration of multiple affective cues. In everyday contexts, however, facial expressions, vocal prosody, and semantic content may convey incongruent emotional information, generating emotional conflict and increasing cognitive demands. Objective: The present study examined how multimodal emotional conflict affects emotion recognition during video viewing, focusing on short videos in which a single actor simultaneously conveyed incongruent emotional cues across facial, vocal, and semantic channels. Methods: Forty-seven undergraduate students completed a gaze-based response task in which, after each short video, they provided a single judgment of the overall emotion conveyed by the stimulus. The videos depicted either congruent or incongruent combinations of semantic content, facial expressions, and vocal prosody across six basic emotions and a neutral condition. Data were analyzed using repeated-measures ANOVAs and generalized linear mixed-effects models. Results: Accuracy was consistently higher for congruent than incongruent stimuli across all domains, indicating a robust emotional interference effect. Critically, the magnitude of this effect differed by domain. Semantic content showed the largest performance reduction under incongruence, followed by facial expression and vocal prosody. Mixed-effects models confirmed these effects while accounting for participant- and item-level variability and revealed a significant Congruency × Domain interaction. Conclusions: In a gaze-based response task requiring a single overall emotion judgment, emotional conflict disrupted recognition in a domain-specific manner, with semantic information being particularly vulnerable to multimodal interference.

Emotional Recognition Under Multimodal Conflict: A Gaze-Based Response Task

Key Points

Abstract

Cite This Study