Key points are not available for this paper at this time.
To ensure the effectiveness of online teaching, educators must understand students’ learning progress. This study proposes LWKD-ViT, a framework designed to accurately capture students’ emotions during online courses. The framework is built on a lightweight facial expression recognition (FER) model with modifications to the fusion block. In addition, knowledge distillation (KD) is integrated into the online course platform to enhance performance. The framework follows a defined process involving face detection, tracking, and clustering to extract facial sequences for each student. An improved model, MobileViT-Local, developed by the authors, extracts emotion features from individual frames of students’ facial video streams for classification and prediction. Students’ facial images are captured through their device cameras and analyzed in real time on their devices, eliminating the need to transmit videos to the teacher’s computer or a remote server. To evaluate the performance of MobileViT-Local, comprehensive tests were conducted on benchmark datasets, including RAFD, RAF-DB, and FER2013, as well as a self-built dataset, SCAUOL. Experimental results demonstrate the model’s competitive performance and superior efficiency. Due to the use of knowledge distillation, the proposed model achieves a prediction accuracy of 94.96%, surpassing other mainstream models. It also exhibits excellent performance, with optimal FLOPs of 0.265 G and a compact size of 4.96 M, while maintaining acceptable accuracy.
Wang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: