Key points are not available for this paper at this time.
Generalization is a central challenge for the deployment of reinforcement (RL) systems in the real world. In this paper, we show that the structure of the RL problem necessitates new approaches to beyond the well-studied techniques used in supervised learning. supervised learning methods can generalize effectively without explicitly for epistemic uncertainty, we show that, perhaps surprisingly, this not the case in RL. We show that generalization to unseen test conditions a limited number of training conditions induces implicit partial, effectively turning even fully-observed MDPs into POMDPs. by this observation, we recast the problem of generalization in RL as the induced partially observed Markov decision process, which we call epistemic POMDP. We demonstrate the failure modes of algorithms that do not handle this partial observability, and suggest a simple-based technique for approximately solving the partially observed. Empirically, we demonstrate that our simple algorithm derived from the POMDP achieves significant gains in generalization over current on the Procgen benchmark suite.
Ghosh et al. (Tue,) studied this question.