Key points are not available for this paper at this time.
To explore or exploit? In this paper, we discuss the long-standing exploration-exploration dilemma in context of designing a learning controller for stunt-style driving with scarce samples. By making an efficient use of a single demonstration by an expert, our algorithm leverages our intuitive understanding of driving to extract a coarse dynamics model from the collected driving data, then formulate the policy search in a setting of gradient update with a specially designed cost function. Both theoretical and empirical results are detailed and discussed.
Lau et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: