What question did this study set out to answer?

The aim is to develop a model that accurately represents learner states using multimodal data in virtual reality education.

June 13, 2026Open Access

Multimodal data fusion and analysis model for virtual reality education based on artificial intelligence

Key Points

The aim is to develop a model that accurately represents learner states using multimodal data in virtual reality education.
Proposed a cross-modal contrastive learning and dynamic graph attention fusion network.
Temporal encoding applied to modalities including eye-tracking, speech, and interaction logs.
Used a graph neural network to generate fusion representations for educational state discrimination.
Achieved an accuracy of 87.6% in cognitive load level classification.
Obtained an F1 score of 0.912 for the classification task.
Demonstrated effective high-precision modeling of learner states in virtual reality scenarios.

Abstract

The heterogeneity and semantic fragmentation of multimodal data in virtual reality education make it difficult to accurately model learner states. To address this, this paper proposes a cross-modal contrastive learning and dynamic graph attention fusion network. This method first performs temporal encoding on modalities such as eye-tracking, speech, pose, and interaction logs. Then, it aligns semantically related multimodal segments in a unified latent space through cross-modal contrastive learning. It then constructs a heterogeneous graph with time steps as nodes and dynamic correlations between modalities as edges. A modality-aware graph attention mechanism is introduced to adaptively aggregate the contributions of each modality at different time points. Finally, a graph neural network is used to generate a fusion representation and drive the educational state discrimination task. Experiments on a self-built VR education multimodal dataset demonstrate that the proposed model achieves an accuracy of 87.6% and an F1 score of 0.912 on the cognitive load level classification task. This model effectively achieves high-precision and interpretable modeling of learner states in virtual reality education scenarios.

Bookmark

View Full Paper

Cite This Study

Xi et al. (Wed,) studied this question.

synapsesocial.com/papers/6a2cf403faef96ed7f05668a https://doi.org/https://doi.org/10.1007/s44163-026-01493-9

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper