Key points are not available for this paper at this time.
Abstract Generative Adversarial Imitation Learning (GAIL) presents the ability to learn policies without prior knowledge of the underlying reward function. However, it often suffers from limited sample efficiency due to its reliance on reinforcement learning for policy learning, mandating extensive real-time interactions with the environment. To address this challenge, this paper introduces a refined framework named TM-GAIL, which combines transition function model learning with GAIL. This approach capitalizes on the utility of neural networks to construct a transition function model, facilitating the generation of virtual samples to complement real data. The training of the discriminator is augmented by the inclusion of virtual samples alongside expert demonstration data. In the context of policy learning, the incorporation of virtual samples, real samples, and the reward derived from the discriminator enriches the policy learning. Furthermore, a self-adaptive error control module has been meticulously designed for the regions characterized by high returns and to mitigate model errors. Empirical findings demonstrate that TM-GAIL significantly improves sample efficiency in comparison to imitation learning and model-free methods. It achieves performance levels that closely align with those of domain experts across both continuous and discrete tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yi Wang
Shengrong Gong
Xin Du
Suzhou University of Science and Technology
Changshu Institute of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68e6e758b6db6435876627ec — DOI: https://doi.org/10.21203/rs.3.rs-4263827/v1