August 25, 2025Open Access

Jointly Optimizing Deployment and Antenna of Base Stations using Hierarchical Reinforcement Learning

Key Points

HMAPPO-RL achieved a coverage rate of 91.66% and an average throughput of 4,983,537 bit/s, demonstrating notable improvements.
The method was validated using a mobile network simulator, providing a clear comparison to existing optimization techniques.
Hierarchical structure separates base station deployment and antenna tuning, allowing for better management of complex scenarios.
Improvements of 3.62% and 6.75% in coverage and throughput respectively were noted over the conventional MAPPO algorithm.

Abstract

The coordinated deployment of multiple base stations (BS) and tuning of antenna configuration plays a crucial role in ensuring high-quality communication services, especially in the context of dense 5G BS deployment in megacities. However, traditional optimization methods, such as heuristics and reinforcement learning (RL), face challenges in addressing such problems involving the coordination of hundreds of BSs due to their limitations in handling the complexity and scale of large-scale scenarios. To address these challenges, this paper proposes the Hierarchical Multi-Agent Proximal Policy Optimization with Representation Learning (HMAPPO-RL). By employing a hierarchical structure, we effectively decouple the optimization problem into two sub-problems: BS deployment and antenna parameter tuning. Different from the step-by-step method of optimizing the BS location and antenna, HMAPPO-RL achieves joint optimization of the two problems through an ingenious interactive mechanism, fully considering the mutual influence of the BS location and antenna. To address the large-scale challenge posed by hundreds of BSs, we utilize the upsampling and downsampling mechanisms of the UNet network to integrate global and local information from large-scale state information for performance enhancement. Since complex environmental information will cause great difficulties for the agent to evaluate the state value in large-scale scenarios, we add a representation learning module to enhance the accuracy of the agent's state value estimation. The experiments using a precise mobile network simulator demonstrate the superiority of the proposed HMAPPO-RL, offering a comparative analysis with existing state-of-the-art methods. HMAPPO-RL achieves a coverage rate of 91.66% and an average throughput of 4,983,537 bit/s. These results represent improvements of 3.62% and 6.75% in coverage rate and throughput respectively when compared with the MAPPO algorithm.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper