What question did this study set out to answer?

The aim is to develop a framework that enhances decision-making for unmanned aerial vehicles in air combat through transfer reinforcement learning.

May 17, 2026

Autonomous Maneuver Decision‐Making for Unmanned Aerial Vehicles via Transfer Reinforcement Learning

Key Points

The aim is to develop a framework that enhances decision-making for unmanned aerial vehicles in air combat through transfer reinforcement learning.
Proposed a transfer reinforcement learning framework organized in three stages: single-agent expert policy acquisition, policy transfer to multi-agent settings, and self-play training.
Utilized curriculum learning and a multi-agent twin delayed deep deterministic policy gradient approach.
Evaluated using terminal metrics and out-of-distribution robustness tests.
TRLSP showed stronger competitive performance compared to baseline methods.
The framework maintained robustness across held-out opponents and varying initial conditions.

Abstract

ABSTRACT Aerial multi‐agent reinforcement learning for within‐visual‐range air combat remains challenging due to exploration difficulty, delayed task feedback, high‐dimensional continuous control, non‐stationary opponents, and instability when adapting policies from single‐agent to multi‐agent training. This paper proposes a transfer reinforcement learning via self‐play (TRLSP) framework that integrates curriculum learning, transfer adaptation, and multi‐agent twin delayed deep deterministic policy gradient for autonomous air combat decision‐making. TRLSP is organized as a three‐stage training pipeline. Stage 1 acquires a single‐agent expert policy through curriculum‐guided learning against progressively stronger opponents. Stage 2 transfers the expert policy to the multi‐agent setting through a progressive network unfreezing schedule with an explicit layer‐release order and learning‐rate scaling. Stage 3 performs MATD3‐based self‐play with a fixed‐size historical strategy pool, where opponent snapshots are sampled uniformly at random and updated through a FIFO rule. The framework is evaluated under matched budgets using terminal/combat metrics, cross‐play comparisons, and out‐of‐distribution (OOD) robustness tests with confidence intervals. Experimental results show that TRLSP achieves stronger competitive performance than representative baselines and remains more robust across held‐out opponents and shifted initial conditions.

AIに質問

Bookmark

AIに質問

Bookmark

Autonomous Maneuver Decision‐Making for Unmanned Aerial Vehicles via Transfer Reinforcement Learning

Key Points

Abstract

Cite This Study