The coordinated deployment of multiple base stations (BS) and tuning of antenna configuration plays a crucial role in ensuring high-quality communication services, especially in the context of dense 5G BS deployment in megacities. However, traditional optimization methods, such as heuristics and reinforcement learning (RL), face challenges in addressing such problems involving the coordination of hundreds of BSs due to their limitations in handling the complexity and scale of large-scale scenarios. To address these challenges, this paper proposes the Hierarchical Multi-Agent Proximal Policy Optimization with Representation Learning (HMAPPO-RL). By employing a hierarchical structure, we effectively decouple the optimization problem into two sub-problems: BS deployment and antenna parameter tuning. Different from the step-by-step method of optimizing the BS location and antenna, HMAPPO-RL achieves joint optimization of the two problems through an ingenious interactive mechanism, fully considering the mutual influence of the BS location and antenna. To address the large-scale challenge posed by hundreds of BSs, we utilize the upsampling and downsampling mechanisms of the UNet network to integrate global and local information from large-scale state information for performance enhancement. Since complex environmental information will cause great difficulties for the agent to evaluate the state value in large-scale scenarios, we add a representation learning module to enhance the accuracy of the agent's state value estimation. The experiments using a precise mobile network simulator demonstrate the superiority of the proposed HMAPPO-RL, offering a comparative analysis with existing state-of-the-art methods. HMAPPO-RL achieves a coverage rate of 91.66% and an average throughput of 4,983,537 bit/s. These results represent improvements of 3.62% and 6.75% in coverage rate and throughput respectively when compared with the MAPPO algorithm.
Su et al. (Mon,) studied this question.