July 1, 2021Open Access

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Key Points

Key points are not available for this paper at this time.

Abstract

Recent advance in deep offline reinforcement learning (RL) has made it to train strong robotic agents from offline datasets. However, on the quality of the trained agents and the application being, it is often desirable to fine-tune such agents via further online. In this paper, we observe that state-action distribution shift lead to severe bootstrap error during fine-tuning, which destroys the good policy obtained via offline RL. To address this issue, we first propose balanced replay scheme that prioritizes samples encountered online while also the use of near-on-policy samples from the offline dataset. , we leverage multiple Q-functions trained pessimistically offline, preventing overoptimism concerning unfamiliar actions at novel states the initial training phase. We show that the proposed method improves-efficiency and final performance of the fine-tuned robotic agents on locomotion and manipulation tasks. Our code is available at: : //github. com/shlee94/Off2OnRL.

Ask AI

Helpful

Bookmark

View Full Paper