This study proposes a multi-feature fusion model combining convolutional neural networks (CNN) for spatial feature extraction and temporal attention mechanisms for dynamic behaviour modelling in university classrooms.The model was trained and evaluated on a dataset of 5,000 annotated classroom behaviour samples collected from intelligent classrooms across multiple disciplines.Compared to baseline methods such as C3D and CNN-LSTM, the proposed model achieves an F1-score improvement of 5%-9% and increases recognition accuracy to over 90% for key interactive behaviours including hand raising, standing, and note-taking.These results demonstrate the effectiveness of integrating spatial and temporal features for precise classroom behaviour recognition, providing quantitative support for intelligent classroom analysis without relying on subjective evaluation.
Luo et al. (Thu,) studied this question.