Chinese language education is increasingly vital for global exchange and collaboration. Traditional evaluation methods relying on subjective observation and basic metrics overlook multimodal classroom dynamics, while existing automated models lack adaptability and accuracy in complex, data-driven learning environments. To develop and adaptive method capable of accurately assessing Chinese teaching effectiveness through machine learning-based multimodal analysis. A hybrid optimization and classification model named Weighted Honey Badger Optimized Categorical Boosting (WHBadger-CatBoost) is introduced, integrating metaheuristic optimization with gradient boosting to enhance predictive performance and interpretability. The WHBadger-CatBoost-driven multimodal framework combines video, audio, and textual cues to measure engagement, comprehension, and instructional quality. A multimodal dataset comprising 1000 classroom samples was constructed from video recordings, audio interactions, images, and student-written submissions, representing diverse learning contexts and participant demographics. Spectrogram enhancement using min–max scaling was applied to audio data; tokenization and stop-word removal were employed for text; and frame sampling with resizing refined visual inputs. Principal Component Analysis (PCA)-based embedding captured speech dynamics, facial expressions, and gesture-based engagement indicators. The WHBadger algorithm optimized CatBoost’s hyperparameters, balancing exploration and exploitation for enhanced classification accuracy while minimizing overfitting. Python with Scikit-learn, CatBoost, and NumPy environments. The model achieved a superior performance, 0.95 in accuracy, demonstrating its reliability in educational assessment. The proposed multimodal ML framework enables comprehensive, adaptive, and data-driven evaluation of Chinese teaching effectiveness, supporting continuous improvement in instructional methodologies.
Yaling Tang (Sun,) studied this question.