Traditional online reinforcement learning (RL) systems operate by actively engaging with their environments to acquire data, with the goal of formulating an optimal policy that maximizes a predefined cumulative reward. However, in scenarios where cost and safety are paramount, the practicality of online RL is constrained. In response, offline RL emerges as a viable solution, leveraging previously amassed datasets to craft an effective policy without the need for ongoing interaction with the environment. An obstacle in offline RL lies in its tendency to overestimate the values of actions not adequately represented in the data, known as out-of-distribution (OOD) actions. While previous approaches have typically sought to enhance performance through increased algorithmic complexity, this article introduces a novel methodology that significantly improves the offline performance, with only minor additional memory cost. This study delves into the analysis of retaining high performance throughout the fully offline training. Since offline learning is unable to correct errors without interaction with the environment, it is highly dependent on the dataset. For example, a good policy can never be trained on a random dataset. On the other hand, even if an algorithm can give good performance temporarily, it will easily lead to OOD errors. Instead of resorting to complicated policy regularization, propose a simple center replacement approach that adjusts the offline dataset to suit the proposed algorithm, so that the OOD errors can be avoided, as well as improving the training performance. Our method introduces an adaptive regularization target that evolves with policy improvement, effectively relaxing the conservatism constraint over time without requiring online interaction.
Building similarity graph...
Analyzing shared references across papers
Loading...
Huihui Zhang
Shanghai Construction Group (China)
Guoyin Chen
Shanghai Construction Group (China)
IEEE Transactions on Neural Networks and Learning Systems
Shanghai Construction Group (China)
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Thu,) studied this question.
synapsesocial.com/papers/6a265bb6ad53cfb9357c52d4 — DOI: https://doi.org/10.1109/tnnls.2026.3688222