In order to fill the theoretical gap in the field of personalised matching of multimodal learning resources, a cross-modal recommendation method for English teaching resources based on deep fusion of interest information is studied.Firstly, collect and analyse user behaviour data, attribute features and learning characteristics on the English learning platform, and construct a multidimensional interest fusion model.Secondly, by integrating the BERT model with cross-modal attention perception methods, a CCA-BERT recommendation model was constructed to achieve deep feature extraction and semantic association modelling of multimodal English teaching resources such as videos, texts and audios.Finally, personalised resource recommendation is completed based on click probability, which breaks through the limitations of traditional single mode recommendation.Empirical findings demonstrate that our cross-modal recommendation approach achieves a user satisfaction level exceeding 94.7%, while simultaneously maintaining recommendation diversity above 0.87% across experimental evaluations.
Liu et al. (Thu,) studied this question.