What question did this study set out to answer?

This survey aims to explore the integration of large language models into deep reinforcement learning frameworks to improve learning efficiency and adaptability.

July 4, 2026Open Access

A Survey on Large Language Models for Deep Reinforcement Learning

Key Points

This survey aims to explore the integration of large language models into deep reinforcement learning frameworks to improve learning efficiency and adaptability.
Comprehensive analysis of studies integrating LLMs with RL, categorized by their roles: agents, reward design, and training environments.
Examination of contributions, challenges, and future research directions in LLM-RL intersection.
Identified three major roles of LLMs: enhancing decision-making, automating reward design, and creating training environments.
Demonstrated that LLMs improve sample efficiency and adaptability in RL settings.

Abstract

Deep reinforcement learning (RL) is a machine learning paradigm in which agents learn policies to maximizerewards through interactions with their environment. It has achieved remarkable success in domains such as roboticcontrol, gaming, and autonomous driving. Large language models (LLMs), trained on massive collections of textdata, have brought revolutionary progress to natural language processing. LLMs capture not only the grammar andsemantics of natural language but also internalize commonsense knowledge and reasoning abilities. LLMs demonstratestrong performance in diverse tasks such as text generation and program synthesis, and their influence extendsacross a wide range of applications. To address the limitations of RL, a growing body of research has explored incorporatingthe knowledge, reasoning capabilities, and program generation skills of LLMs into RL frameworks. LLMsprovide prior knowledge that facilitates policy learning from limited interactions, enable long-term prediction and efficientexploration, reduce the cost of reward design through natural language, and generate diverse environments thatsupport generalization. These properties improve sample efficiency, support reward specification, and enhance theadaptability of training environments in RL. This paper presents a comprehensive survey of such studies, organizingthem according to the roles that LLMs play within the RL process. Specifically, three major categories are examined:LLM-based agents, which enhance decision-making; LLM-based reward design, which automates or supports theconstruction of reward functions; and LLM-based training environments, which provide environments to facilitatelearning. Based on this taxonomy, the survey analyzes contributions, challenges, and future research directions at theintersection of LLMs and RL.

KI fragen

Bookmark

View Full Paper