The increasing densification of cell-free massive multiple-input multiple-output (MIMO) networks makes access point switch on/off (ASO) a key mechanism for improving energy efficiency in future wireless systems. While reinforcement learning (RL) has been explored for ASO, differences in modeling assumptions and evaluation scope leave open questions regarding robustness and scalability. In this work, ASO is investigated from an explicit energy-efficiency perspective using a RL framework based on Proximal Policy Optimization (PPO). The policy learns state-dependent AP activation under partial observability using compact per-access point (AP) large-scale fading statistics and power parameters, without requiring instantaneous small-scale channel state information or combinatorial search, enabling practical online implementation. A comprehensive evaluation is conducted under a unified and reproducible simulation framework across three cell-free deployment scenarios of increasing size that preserve AP density while incorporating realistic channel and power consumption models. Performance is assessed through both average and distribution-based metrics. Numerical results show that the PPO-based policy consistently outperforms random activation and the all-on baseline, achieving energy-efficiency improvements of up to 66% and nearly 50%, respectively, while activating a comparable number of APs. Moreover, the learned policy maintains robust performance as the network scales, reducing the likelihood of highly energy-inefficient operating regimes.
García-Barrios et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: