Abstract Autonomous drone navigation with deep reinforcement learning (DRL) is hindered by the difficulty of specifying reward functions for vision-based, continuous control in complex environments. We address this by using inverse reinforcement learning (IRL) to recover a task-aligned reward directly from expert demonstrations; specifically, we employ adversarial IRL (AIRL) to learn the reward. In evaluation, the learned-reward policy improves success rate, smoothness and trajectories consistency compared with a carefully tuned human-designed reward and baselines reward function. These results indicate that learning the reward from demonstrations provides a precise and transferable objective for autonomous flight, achieving better performance and better guidance under verification of our evaluation protocol without manual reward engineering. To the best of our knowledge, this is the first work to successfully apply an AIRL framework for visual drone navigation.
Chen et al. (Sun,) studied this question.