With the increasing deployment of Reinforcement Learning (RL) for network optimization at the edge of wirelessnetworks, the RL workload emerges as a significant challenge. While the placement of general Machine Learning workloadsacross the cloud–edge continuum has been widely studied, existing solutions typically exclude RL techniques due to theirdistinct structure and operational requirements. In this work, we propose a framework for RL workload placement in thecloud–edge continuum, enabling the scaling of RL actor processes across both domains. In this framework, agents that interact with the environment through simple feedback loops are deployed at the edge, while training and model storage are performed in the cloud, where sufficient computational resources are available. We implement and simulate a prototype of one scaled RL actor that performs Quality-of-Service-aware resource block assignment with separate threads for environment interaction, inference, buffering/sampling, and the learning process. Finally, we outline the open challenges of the proposed framework.
Ghafouri et al. (Mon,) studied this question.