July 13, 2021Open Access

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Key Points

Key points are not available for this paper at this time.

Abstract

Generalization is a central challenge for the deployment of reinforcement (RL) systems in the real world. In this paper, we show that the structure of the RL problem necessitates new approaches to beyond the well-studied techniques used in supervised learning. supervised learning methods can generalize effectively without explicitly for epistemic uncertainty, we show that, perhaps surprisingly, this not the case in RL. We show that generalization to unseen test conditions a limited number of training conditions induces implicit partial, effectively turning even fully-observed MDPs into POMDPs. by this observation, we recast the problem of generalization in RL as the induced partially observed Markov decision process, which we call epistemic POMDP. We demonstrate the failure modes of algorithms that do not handle this partial observability, and suggest a simple-based technique for approximately solving the partially observed. Empirically, we demonstrate that our simple algorithm derived from the POMDP achieves significant gains in generalization over current on the Procgen benchmark suite.

Bookmark

View Full Paper

Cite This Study

Ghosh et al. (Tue,) studied this question.

synapsesocial.com/papers/6a0fc35fb6f5ee04015ff62c https://doi.org/https://doi.org/10.48550/arxiv.2107.06277

Bookmark

View Full Paper