What question did this study set out to answer?

The study aims to enhance interactive recommendation systems by overcoming challenges in user preference modeling and decision-making.

April 23, 2026Open Access

Adaptive reinforcement learning for recommendation via large language models and knowledge graphs

Key Points

The study aims to enhance interactive recommendation systems by overcoming challenges in user preference modeling and decision-making.
Developed the ARLK framework combining large language model-guided offline pretraining and knowledge graph-enhanced online learning.
Formulated the recommendation task as a Markov decision process to model user preferences dynamically.
Implemented an adaptive policy fusion mechanism for transitioning from offline to online learning.
ARKL showed average reward improvements of 5.15%, 3.40%, and 1.80% on the LFM, Industry, and Coat datasets, respectively.
Achieved up to 12.73% gain in Recall@10 on the Coat dataset compared to existing methods.
Demonstrated substantial enhancement in both initial recommendation quality and long-term performance.

Abstract

Abstract Interactive recommendation systems (IRS) have become a prominent research topic as they dynamically optimize user experience through real-time feedback loops. To model the evolving dynamics of user preferences and maximize long-term rewards, reinforcement learning (RL) has been incorporated into IRS by formulating the recommendation process as a Markov decision process (MDP). However, RL policies trained on static offline data still face two major challenges: (1) distribution shift , where the mismatch between offline logs and dynamic online environments often leads to suboptimal long-term decision-making; and (2) sample efficiency , as the large action space in recommendation tasks requires substantial interaction before achieving optimal performance. To address these issues, we propose ARLK (Adaptive Reinforcement Learning with Large Language Models and Knowledge Graphs), a novel adaptive framework that combines large language model (LLM)-guided offline pretraining and knowledge graph (KG)-enhanced online learning via an adaptive policy fusion mechanism that smoothly transitions from offline initialization to online adaptation. LLMs provide strong semantic understanding that can capture user preferences and simulate interaction feedback, thereby improving policy pretraining and ensuring high-quality initial recommendations in simulation-based online evaluation. Meanwhile, the structured information in KGs is utilized during policy learning to guide candidate generation and significantly reduce exploration cost. Experiments on three benchmark datasets demonstrate that ARLK achieves substantial improvements in both initial recommendation quality and long-term performance compared with state-of-the-art baselines, with average reward improvements of 5.15%, 3.40%, and 1.80% on LFM, Industry, and Coat datasets, respectively, and up to 12.73% gain in Recall@10 on the Coat dataset.

Perguntar à IA

Bookmark

View Full Paper