What question did this study set out to answer?

This research aims to develop a model for emotion recognition in E-learning by leveraging the semantic hierarchy of instructional videos along with physiological signals.

May 8, 2026Open Access

Exploiting multimodal video semantic hierarchy for emotion recognition in E-learning

Key Points

This research aims to develop a model for emotion recognition in E-learning by leveraging the semantic hierarchy of instructional videos along with physiological signals.
Proposed an Emotion Recognition Model utilizing Multimodal Video Semantic Hierarchy
Constructed hierarchical video semantics that are integrated through hierarchical stacking
Combined the semantic representation with learners' eye-movement signals for improved emotion detection.
Model showed significant improvement in emotion recognition performance on three datasets: VLMED, HCI-Tagging, and DEAP.
Experimental validation confirmed the effectiveness of integrating video semantics and physiological signals.

Abstract

In E-learning, accurately recognizing the learners’ emotions is a crucial prerequisite for enhancing learning outcomes and teaching quality. Most existing emotion recognition studies identify the emotions of learners by integrating their physiological signals and facial expressions, but these studies often overlook the impact of the different hierarchy of semantics embedded in instructional videos on learners’ emotion. Therefore, we innovatively propose an Emotion Recognition Model based on Multimodal Video Semantic Hierarchy. This model constructs hierarchical video semantics and gradually integrates them through hierarchical stacking. This fused semantic representation is then combined with the learners’ eye-movement physiological signals to enhance emotion recognition performance. Experimental results on three public multimodal physiological datasets, VLMED, HCI-Tagging and DEAP, confirms the model’s effectiveness in emotion recognition tasks.

Bookmark

View Full Paper

Bookmark

View Full Paper

Exploiting multimodal video semantic hierarchy for emotion recognition in E-learning

Key Points

Abstract

Cite This Study