February 8, 2024Open Access

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Key Points

Key points are not available for this paper at this time.

Abstract

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jost Tobias Springenberg

Google (United States)

Abbas Abdolmaleki

DeepMind (United Kingdom)

Jingwei Zhang

South China University of Technology

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study