What question did this study set out to answer?

This research aims to develop an effective UAV path tracking method utilizing the Proximal Policy Optimization algorithm to enhance tracking accuracy and decision-making.

April 25, 2026Open Access

Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking

Key Points

This research aims to develop an effective UAV path tracking method utilizing the Proximal Policy Optimization algorithm to enhance tracking accuracy and decision-making.
Constructed a 3D path planning model incorporating UAV and ground vehicle kinematics, velocity, and attitude constraints.
Designed objective function focusing on tracking error minimization, energy optimization, and maintaining safety distance.
Implemented state space, action space, and reward function for adaptive learning with PPO in dynamic obstacle scenarios.
PPO achieves an error of about 0.2 m in trajectory tracking, outperforming Q-learning (1 m) and TD3/APF (0.3 m with oscillations).
PPO demonstrates superior convergence speed and stability in learning compared to traditional algorithms.
Simulation results indicate enhanced UAV path optimization and intelligent decision-making capabilities with PPO.

Abstract

Unmanned Aerial Vehicle (UAV) tracking of ground moving targets holds significant applications in domains such as intelligent transportation, logistics distribution, and environmental monitoring, placing greater demands on efficient and stable path-planning methods for vehicular tracking. This study investigates a UAV path tracking approach based on a deep reinforcement learning algorithm, Proximal Policy Optimization (PPO). Starting from the kinematic characteristics of UAVs and ground vehicles, a 3D path planning model was constructed that considers spatial coordinates, velocity, and attitude constraints. A well-designed objective function—including tracking error minimization, energy optimization, and safety distance constraints—was incorporated. By designing the state space, action space, and reward function, the PPO algorithm is capable of adaptive learning in complex environments. Compared with traditional Artificial Potential Field (APF), Q-learning, and TD3 algorithms, PPO better balances exploration and exploitation and demonstrates stronger learning stability and global optimization capability in dynamic multi-obstacle scenarios. Simulation results show that PPO-based UAV path planning outperforms Q-learning and other comparative algorithms in terms of tracking accuracy, convergence speed, and robustness. In specific scenarios, Q-learning achieves a trajectory error of approximately 1 m, TD3 and APF exhibit errors around 0.3 m with noticeable oscillations, and PPO achieves an error of about 0.2 m. The UAV can follow the vehicle trajectory smoothly, with a more continuous path and rapidly converging, stable error curves, indicating the promising application potential of PPO in intelligent UAV control. The PPO-based UAV-tracking path planning method effectively enhances the UAV’s intelligent decision-making and path optimization capabilities, providing new technical approaches and a research foundation for intelligent UAV traffic and cooperative control systems.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Qiao et al. (Thu,) studied this question.

synapsesocial.com/papers/69ec5b3d88ba6daa22dacc09 https://doi.org/https://doi.org/10.3390/drones10050319

Bookmark

View Full Paper