August 1, 2024Open Access

End-to-end control of quadrotor based on preference learning

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract The highly coupled, underactuated, and nonlinear characteristics of quadrotors make it difficult to meet the need for efficient and stable control performance in unknown dynamic environments through the method of modeling and designing controllers. Reinforcement learning allows learning on the base of the controlled object model, updating and optimizing the control strategy with data generated from interactions with the environment, providing a new solution to this problem. However, conveying complex objectives to quadrotors is often challenging, involving the design of reward functions that need to provide sufficient information. Imitation learning can teach agents interactively by learning prior knowledge, but it also faces problems such as the difficulty of acquiring prior knowledge. In this work, our goal is to bypass the design of reward functions and improve the generalizability of quadrotors in different tasks. Specifically, we score the trajectories generated by quadrotors, learn the reward model based on preferences between different trajectories, and use it to train the quadrotors. We can demonstrate that using reward models fitted according to trajectory preferences and directly defining reward functions yields consistent results, maintaining satisfactory learning rates and performance in both “velocitycontrol” and “hoveringcontrol” tasks.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper