Los puntos clave no están disponibles para este artículo en este momento.
Dyna-style reinforcement learning (RL) agents improve sample efficiency over-free RL agents by updating the value function with simulated experience by an environment model. However, it is often difficult to learn models of environment dynamics, and even small errors may result in of Dyna agents. In this paper, we investigate one type of model error: states. These are states generated by the model, but that are not states of the environment. We present the Hallucinated Value Hypothesis (HVH): updating values of real states towards values of hallucinated states in misleading state-action values which adversely affect the control. We discuss and evaluate four Dyna variants; three which update real toward simulated -- and therefore potentially hallucinated -- states and which does not. The experimental results provide evidence for the HVH thus a fruitful direction toward developing Dyna algorithms robust to error.
Jafferjee et al. (Mon,) studied this question.