The utilization of drone swarms for cooperative missions is becoming increasingly prevalent. However, establishing high-concurrency and highly reliable communication links in complex environments remains a significant challenge. Existing methods based on traditional Medium Access Control (MAC) protocols struggle to cope with high-density collisions, while conventional deep reinforcement learning (DRL) approaches often encounter convergence difficulties in non-stationary interference environments, leading to notable limitations in anti-jamming robustness and algorithmic efficiency. To tackle this problem, this paper proposes a dynamic access algorithm based on Curriculum Learning-assisted Multi-Agent Proximal Policy Optimization (CL-MAPPO). Specifically, we adopt a Centralized Training with Decentralized Execution (CTDE) architecture to enable implicit spectrum cooperation within the swarm. Notably, we design a three-stage progressive curriculum learning mechanism—basic collision avoidance, load balancing, and dynamic anti-jamming—coupled with a phased reward reshaping strategy, guiding the agents to progressively master intelligent frequency-hopping decisions in complex environments. Experimental results demonstrate that in simulated scenarios involving dynamic sweep jamming and high-load multi-drone communication, the proposed method significantly outperforms baseline models such as Carrier Sense Multiple Access (CSMA), random frequency hopping, and Multi-Agent Deep Deterministic Policy Gradient (MADDPG) in terms of normalized throughput, channel collision rate, and convergence speed. This research provides theoretical support and an algorithmic foundation for achieving highly reliable access in large-scale swarm data links under harsh environmental conditions.
Yuan et al. (Wed,) studied this question.