This article presents a novel curriculum framework named symmetric-response guided reinforcement learning (RL) for autopilot control of fixed-wing aircraft, driven by an adaptive bidirectional learning curve and a dynamic target scheduling mechanism. Unlike traditional methods with static or overly smoothed learning progressions, the proposed method dynamically adjusts the learning curve's slope in both directions based on historical reward trends, allowing the learning intensity to increase or decrease as needed. This bidirectional adjustment ensures that the agent is neither overwhelmed by excessively difficult tasks nor stagnated by too-easy ones, leading to better stability and faster convergence. Furthermore, dynamic target generation within an episode from static target constraints enables both reward amplification and implicit enforcement of maneuver rate constraints, improving learning efficiency without manual reward shaping. Experiments on trajectory tracking tasks show that the proposed controller achieves faster convergence, reduced overshoot, and more accurate tracking under turbulence. Further tests on waypoint navigation and dynamic pursuit demonstrate its superiority over the baseline, achieving more precise and timely interception. These results highlight the robustness and applicability of the controller to complex aerial missions such as autonomous air combat.
Li et al. (Thu,) studied this question.