Video clips became extremely popular when viewers began to shift away from television and to mobile. However, their ability to report real world distribution will be limited by their ability to obtain data or evaluation criteria accurately and efficiently. This paper describes a new recommendation system based on Graph Convolutional Networks (GCNs), a self-attention mechanism, and Deep Reinforcement Learning (DRL) to capture multimodal user-item interactions as well as user preferences and item attributes. The use of modality-specific graphs allowed us to express user preferences and item attributes correctly, while the separate visual, audio, and textual modality contributed to our comprehensive representation of multimodal features. The multi-head attention mechanism assigns adaptive weights to neighbors during aggregation, while dynamic negative sampling selects hard negative items that are similar to positive interactions but not engaged by the user. The integrated model consistently outperformed baseline models and partially upgraded configurations in experimental evaluations. The results show that incorporating multimodal data and reinforcement learning improves recommendation performance, particularly in terms of Precision@K, Recall@K, and NDCG@K. DRL allowed the model to adapt to changing user preferences by optimizing recommendation policies based on cumulative reward signals from sequential user interactions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xi Zhang
General Cardiology
Jun Yin
Jiangnan University
Scientific Reports
Jiangnan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Tue,) studied this question.
synapsesocial.com/papers/6a17dc853fad632b0f9d9361 — DOI: https://doi.org/10.1038/s41598-026-53106-1