What does this research mean for the field?

The RNN-enhanced Diverse Curriculum-driven Learning Algorithm (REDCRL) significantly accelerates policy convergence and improves performance in UAV maneuver decision-making compared to traditional algorithms. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

To develop an effective learning algorithm that enables UAVs to perform better in dynamic environments despite limited experience.

February 19, 2026Open Access

An RNN-Enhanced Diverse Curriculum-Driven Learning Algorithm Based on Deep Reinforcement Learning for POMDPs with Limited Experience

Key Points

To develop an effective learning algorithm that enables UAVs to perform better in dynamic environments despite limited experience.
Introduced an RNN-enhanced curriculum-driven learning algorithm based on deep reinforcement learning.
Modified actor–critic networks to include Bi-LSTM in policy networks.
Developed an Adaptive Multi-Feature Evaluation Experience Replay method for improved experience sampling.
Utilized the Twin Delayed Deep Deterministic Policy Gradient algorithm for policy optimization.
Achieved faster policy convergence compared to traditional algorithms.
Improved performance of decision-making policies for UAVs in long-range dynamic environments.

Abstract

Autonomous flight is a critical capability for unmanned aerial vehicles (UAVs), enabling applications in wildlife and plant protection, infrastructure inspection, search and rescue, and other complex missions. Although some learning-based methods have achieved considerable progress, traditional algorithms still struggle with real-world challenges, due to the partially observable nature of environments and limited experience regarding the properties of dynamic unknown environments where threats and targets are movable and unpredictable. To address these difficulties, it is necessary to achieve autonomous guidance for UAVs performing long-range missions in dynamic environments (LRGDEs), and to develop a novel end-to-end algorithm that can overcome partial observability under limited state transitions. In this paper, we propose an RNN-enhanced Diverse Curriculum-driven Learning Algorithm (REDCRL) based on deep reinforcement learning. We modify the structure of traditional actor–critic networks and introduce Bi-LSTM into policy networks (referred to as Bi-LSTM-modified Policy Networks (BLPNs)) to alleviate observation incompleteness. Furthermore, to fully exploit the potential value of data and mitigate the problem of insufficient samples, we develop an Adaptive Multi-Feature Evaluation Experience Replay (AMFER) method to reshape the process of experience replay buffer construction and sampling. In addition, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is adopted to optimize UAV-maneuver decision policies. Compared with traditional algorithms, the proposed algorithm can accelerate policy convergence and improve the performance of the trained policy.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper

Cite This Study

Li et al. (Tue,) studied this question.

synapsesocial.com/papers/6996a8efecb39a600b3f0292 https://doi.org/https://doi.org/10.3390/drones10020142

AIに質問

Bookmark

View Full Paper