What question did this study set out to answer?

To develop a multimodal deep learning framework that accurately predicts interactive efficacy in college English classrooms.

April 21, 2026Open Access

Analysis of interactive efficacy in college English classroom with multimodal machine learning

Key Points

To develop a multimodal deep learning framework that accurately predicts interactive efficacy in college English classrooms.
Developed MM-IEANet integrating visual, acoustic, and textual data for analysis.
Utilized CNNs for visual features, BiLSTMs for text, and 1D-CNNs and LSTM for audio.
Applied a Cross-Modal Transformer Fusion module for effective representation integration.
Achieved over 12% improvement in classification accuracy on a custom-labeled dataset.
Detected that auditory attributes correlated most significantly with interactive success.
Validated approach across various student cohorts, indicating strong generalization.

Abstract

Interactive efficacy in college English classrooms improves language acquisition in modern educational environments. Traditional evaluation techniques often fail to identify student interest and achievement due to subjective grading and fragmented analysis. This research proposed MM-IEANet, a multimodal deep learning framework that integrates visual, acoustic, and textual data on student activities to investigate and predict interactive efficacy. Our primary goal is to develop a robust system that accurately represents real-time student performance and instructor feedback through automated, multimodal processing. MM-IEANet extracts meaningful representations using modality-specific encoders—CNNs for visual features, BiLSTMs for text, and 1D-CNNs and LSTM for audio. A Cross-Modal Transformer Fusion module integrates these representations, and a Hierarchical Attention Network predicts efficacy by modality. On a custom-labeled dataset, MM-IEANet demonstrated an over 12% improvement in classification accuracy and a considerable reduction in score prediction error. The attention processes explained which modalities most affected grading. Analysis showed that auditory attributes correlated most with interactive success, followed by textual quality and visual presentation coherence. The approach also generalized well across student cohorts. In conclusion, MM-IEANet uses multimodal machine learning to evaluate English classroom engagement in a scalable, interpretable, and accurate manner.

Bookmark

View Full Paper

Bookmark

View Full Paper

Analysis of interactive efficacy in college English classroom with multimodal machine learning

Key Points

Abstract

Cite This Study