What question did this study set out to answer?

This research aims to enhance communication reliability and efficiency in UAV swarms in complex environments.

May 9, 2026Open Access

A Curriculum-Learning-Assisted MAPPO-Based Algorithm for Dynamic Spectrum Access and Anti-Jamming in UAV Swarms

Key Points

This research aims to enhance communication reliability and efficiency in UAV swarms in complex environments.
Proposed a Curriculum Learning-assisted Multi-Agent Proximal Policy Optimization (CL-MAPPO) algorithm.
Utilized a Centralized Training with Decentralized Execution (CTDE) architecture for spectrum cooperation.
Developed a three-stage progressive curriculum learning mechanism focused on collision avoidance, load balancing, and dynamic anti-jamming.
The CL-MAPPO outperformed baseline models including Carrier Sense Multiple Access (CSMA) and random frequency hopping in throughput and collision rates.
Significant improvements in convergence speed compared to Multi-Agent Deep Deterministic Policy Gradient (MADDPG).
Demonstrated effectiveness in scenarios with dynamic sweep jamming and multi-drone communication.

Abstract

The utilization of drone swarms for cooperative missions is becoming increasingly prevalent. However, establishing high-concurrency and highly reliable communication links in complex environments remains a significant challenge. Existing methods based on traditional Medium Access Control (MAC) protocols struggle to cope with high-density collisions, while conventional deep reinforcement learning (DRL) approaches often encounter convergence difficulties in non-stationary interference environments, leading to notable limitations in anti-jamming robustness and algorithmic efficiency. To tackle this problem, this paper proposes a dynamic access algorithm based on Curriculum Learning-assisted Multi-Agent Proximal Policy Optimization (CL-MAPPO). Specifically, we adopt a Centralized Training with Decentralized Execution (CTDE) architecture to enable implicit spectrum cooperation within the swarm. Notably, we design a three-stage progressive curriculum learning mechanism—basic collision avoidance, load balancing, and dynamic anti-jamming—coupled with a phased reward reshaping strategy, guiding the agents to progressively master intelligent frequency-hopping decisions in complex environments. Experimental results demonstrate that in simulated scenarios involving dynamic sweep jamming and high-load multi-drone communication, the proposed method significantly outperforms baseline models such as Carrier Sense Multiple Access (CSMA), random frequency hopping, and Multi-Agent Deep Deterministic Policy Gradient (MADDPG) in terms of normalized throughput, channel collision rate, and convergence speed. This research provides theoretical support and an algorithmic foundation for achieving highly reliable access in large-scale swarm data links under harsh environmental conditions.

A Curriculum-Learning-Assisted MAPPO-Based Algorithm for Dynamic Spectrum Access and Anti-Jamming in UAV Swarms

Key Points

Abstract

Cite This Study