Real-time assessment and enhancement of student engagement remains a significant challenge in higher education. Existing AI-based systems predominantly offer descriptive analytics without closed-loop feedback and rely on limited data modalities. We propose a novel AI-based multimodal framework that integrates visual, audio, physiological, and behavioral data streams to detect student engagement in real-time using a temporal Bidirectional LSTM (BiLSTM) model. The system features a closed-loop feedback mechanism that triggers personalised interventions when engagement drops below adaptive thresholds. The framework is rigorously evaluated on four public benchmark datasets (DAiSEE, CMOSE, RoomReader, DIPSER) covering diverse learning environments. Our approach achieves state-of-the-art performance with a macro-F1 score of up to 0.78 and 82% accuracy on the CMOSE dataset, outperforming vision-only models by +0.10 macro-F1, late-fusion baselines by +0.07 macro-F1, and log-only models by up to +0.22 macro-F1. Ablation studies reveal physiological signals as the most significant modality (35% contribution), followed by visual (28%) and audio (22%) cues. The closed-loop system successfully triggers context-aware interventions during engagement drops in simulated sessions. This work bridges the gap between descriptive analytics and actionable pedagogy, offering a scalable, ethically-aware framework for proactive learning support in university settings.
Dan et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: