Key points are not available for this paper at this time.
We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate the problem using the framework of reinforcement learning and options (Sutton, Precup Precup, 2000). We derive gradient-based algorithms for learning the termination conditions of options, with the goal of optimizing the expected long-term return. We incorporate the proposed approach into policy-gradient methods with linear function approximation.
Comanici et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: