Network slicing is a cornerstone of 5G/6G vertical services, yet practical deployments require mobile network operators (MNOs) to adjust slice service level agreement (SLA) weights based on quality of experience (QoE), causing rapid non-stationary objective changes that can destabilize deep reinforcement learning (DRL) slicing policies and necessitate retraining. This paper proposes Continual Mixture of Experts (CoMEx) for fast policy adaptation. CoMEx pre-trains and freezes multiple expert policies under diverse SLA preferences, explicitly appends the SLA weight vector to observations, and trains a DRL-based gating network to fuse expert actions at the step level for fast adaptation to unseen SLA configurations. To broaden coverage without degrading existing experts, CoMEx further incorporates a masked expert expansion mechanism that incrementally adds new experts and fine-tunes the gate. Step-level DRL gating demonstrates superior generalization in RAN slicing, attaining a mean score of 78.95 under unseen SLA weights—surpassing episode-level and supervised gating by 2.40% and 27.67%, respectively. Moreover, CoMEx’s extensibility is highlighted by a 7.08% performance boost (reaching 84.54) upon the addition of a fourth expert. Such results confirm the framework’s capacity for timely and robust policy adaptation in non-stationary SLA environments.
鈴木 et al. (Thu,) studied this question.