Key points are not available for this paper at this time.
ABSTRACT Aerial multi‐agent reinforcement learning for within‐visual‐range air combat remains challenging due to exploration difficulty, delayed task feedback, high‐dimensional continuous control, non‐stationary opponents, and instability when adapting policies from single‐agent to multi‐agent training. This paper proposes a transfer reinforcement learning via self‐play (TRLSP) framework that integrates curriculum learning, transfer adaptation, and multi‐agent twin delayed deep deterministic policy gradient for autonomous air combat decision‐making. TRLSP is organized as a three‐stage training pipeline. Stage 1 acquires a single‐agent expert policy through curriculum‐guided learning against progressively stronger opponents. Stage 2 transfers the expert policy to the multi‐agent setting through a progressive network unfreezing schedule with an explicit layer‐release order and learning‐rate scaling. Stage 3 performs MATD3‐based self‐play with a fixed‐size historical strategy pool, where opponent snapshots are sampled uniformly at random and updated through a FIFO rule. The framework is evaluated under matched budgets using terminal/combat metrics, cross‐play comparisons, and out‐of‐distribution (OOD) robustness tests with confidence intervals. Experimental results show that TRLSP achieves stronger competitive performance than representative baselines and remains more robust across held‐out opponents and shifted initial conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuzhou Huang
Yaosong Long
Zhongtao Cheng
International Journal of Robust and Nonlinear Control
Huazhong University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a095b5d7880e6d24efe10da — DOI: https://doi.org/10.1002/rnc.70586