ABSTRACT This paper presents a reinforcement learning (RL)‐based framework for optimising modulation classification, power allocation, and resource management in NOMA systems. The primary objective is to leverage machine learning techniques, particularly deep reinforcement learning (DRL) and Q‐learning, to enhance spectral efficiency, minimise interference, and improve signal detection accuracy. The algorithm has been implemented incorporating key components such as state representation, temporal difference learning, experience replay, policy optimisation, and generalised advantage estimation (GAE). The results demonstrate the evolution of Q‐values, policy gradients, proximal policy optimisation (PPO) loss, and the temporal difference (TD) error, providing insights into the agent's learning process and convergence behaviour. The findings highlight the effectiveness of reinforcement learning in dynamic wireless environments, offering adaptive, data‐driven solutions for NOMA systems where traditional optimisation methods struggle. By analysing the stability and efficiency of the proposed approach, this study underscores the potential of RL to enhance modulation classification and resource allocation, paving the way for next‐generation intelligent communication networks. Furthermore, the results reveal that integrating softmax policy selection and prioritised experience replay significantly improves the agent's learning efficiency, ensuring faster convergence and better decision‐making under varying channel conditions. These insights establish a strong foundation for future research on RL‐driven autonomous communication networks, enabling robust and intelligent resource management in 5G and beyond wireless systems.
Alammar et al. (Thu,) studied this question.