Key points are not available for this paper at this time.
In visual Reinforcement Learning (RL), the challenge of generalization to new environments is paramount. This study pioneers a theoretical analysis of visual RL generalization, establishing an upper bound on the generalization objective, encompassing policy divergence and Bellman error components. Motivated by this analysis, we propose maintaining the cross-domain consistency for each policy in the policy space, which can reduce the divergence of the learned policy during the test. In practice, we introduce the Truncated Return Prediction (TRP) task, promoting cross-domain policy consistency by predicting truncated returns of historical trajectories. Moreover, we also propose a Transformer-based predictor for this auxiliary task. Extensive experiments on DeepMind Control Suite and Robotic Manipulation tasks demonstrate that TRP achieves state-of-the-art generalization performance. We further demonstrate that TRP outperforms previous methods in terms of sample efficiency during training.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e72962b6db6435876a3344 — DOI: https://doi.org/10.1609/aaai.v38i6.28369
Shuo Wang
Zhihao Wu
Xiaobo Hu
Building similarity graph...
Analyzing shared references across papers
Loading...