This research effort focuses on employing a transformer-based chatbot to improve English speaking skills in mixed learning contexts. Data about learners were gathered via speech recordings, interaction logs, and engagement monitoring. Data pretreatment steps included denoising speech with a Wiener filter, filling in missing values in textual interaction metrics, and normalizing posture and behavioral cues with min-max scaling. For feature extraction, voice analysis utilized Mel-Frequency Cepstral Coefficients (MFCCs), Term Frequency-Inverse Document Frequency (TF-IDF) was applied to textual interactions, and ResNet-18 for behavioral and engagement metrics. The transformer-based Dynamic Slime Mold-mutated Robustly Optimized Bidirectional Encoder Representations from Transformers-Pretraining Approach (DSM-RoBERTa) for effective speech recognition and context-sensitive dialogue generation. It leverages RoBERTa for enhanced language understanding and fine-tunes parameters for improved accuracy. The chatbot provided incremental practice at the phonetic, semantic, and freestyle levels, offering real-time feedback on pronunciation accuracy, fluency, and engagement while tracking learner progress. Accuracy, precision, recall, F1-score, word error rate (WER), learning efficiency, and user satisfaction were all used to assess model performance. On a held-out test set of 2500 learners, DSM-RoBERTa scored an 98.85% accuracy, 98.23% precision, 98.31% F1-score, 97.15% recall, 23.2% learning efficiency, 88% user satisfaction with WER of 0.068, with consistent results throughout cross-validation folds. These findings show that the DSM-RoBERTa framework could promote adaptive, context-aware, and progressive speaking practice, resulting in a scalable, immersive, and personalized language learning environment. The concept provides a dependable option for blended English as a Foreign Language (EFL) training, linking classroom learning with independent practice while improving.
Dailing Ji (Tue,) studied this question.