Deep reinforcement learning (RL) is a machine learning paradigm in which agents learn policies to maximizerewards through interactions with their environment. It has achieved remarkable success in domains such as roboticcontrol, gaming, and autonomous driving. Large language models (LLMs), trained on massive collections of textdata, have brought revolutionary progress to natural language processing. LLMs capture not only the grammar andsemantics of natural language but also internalize commonsense knowledge and reasoning abilities. LLMs demonstratestrong performance in diverse tasks such as text generation and program synthesis, and their influence extendsacross a wide range of applications. To address the limitations of RL, a growing body of research has explored incorporatingthe knowledge, reasoning capabilities, and program generation skills of LLMs into RL frameworks. LLMsprovide prior knowledge that facilitates policy learning from limited interactions, enable long-term prediction and efficientexploration, reduce the cost of reward design through natural language, and generate diverse environments thatsupport generalization. These properties improve sample efficiency, support reward specification, and enhance theadaptability of training environments in RL. This paper presents a comprehensive survey of such studies, organizingthem according to the roles that LLMs play within the RL process. Specifically, three major categories are examined:LLM-based agents, which enhance decision-making; LLM-based reward design, which automates or supports theconstruction of reward functions; and LLM-based training environments, which provide environments to facilitatelearning. Based on this taxonomy, the survey analyzes contributions, challenges, and future research directions at theintersection of LLMs and RL.
Suzuki et al. (Wed,) studied this question.