• Multimodal attention–BiLSTM–XGBoost framework proposed • Integrates BERT, ResNet-50 and Wav2Vec features • Achieves 98.45% matching accuracy in education • Improves Recall@1 (88%) and MRR (91%) • Outperforms IoT-PAMD and CM-LEQA models The rapid development of artificial intelligence (AI) and sensor technologies has enabled personalized and adaptive learning experiences. However, traditional educational resource-matching systems struggle to process multimodal data and capture complex semantic relationships, limiting their effectiveness in diverse learning environments. To address these challenges, this study proposes the Attentive Extreme Gradient Sequential Bidirectional Memory Net (AEGSBMN), a novel multimodal deep learning framework that integrates Bidirectional Long Short-Term Memory (Bi-LSTM) networks with an attention mechanism for contextual feature weighting, and an Extreme Gradient Boosting (XGBoost) classifier for final decision-making. The framework was evaluated using a multimodal educational dataset containing textual content, annotated images, and synchronized speech data. Preprocessing included spectral gatingfor audio denoising, histogram equalization for image enhancement, and tokenization withstop-word removalfor text normalization. Feature extraction employed BERT embeddings for text, ResNet-50 for visual data, and Wav2Vec for acoustic signals. Extracted features were fused through Bi-LSTM layers with attention to capture temporal dependencies and highlight salient multimodal features, followed by XGBoost for classification. Experimental results demonstrate that AEGSBMN achieves a matching accuracy of 97% , with improved recall, ranking metrics (Recall@1: 88%, Recall@5: 94%, MRR: 91%), and reduced error rates (15.01%). These findings indicate that AEGSBMN effectively enhances semantic alignment, learner comprehension, and adaptive resource retrieval in multimodal educational environments. The framework was implemented in Python , using PyTorch and Hugging Face Transformers for deep learning, and XGBoost for classification.
Jing Zhou (Sun,) studied this question.