• Jointly optimizes region partitioning and region-specific power pool design. • Proposes a dual-agent DRL framework for cooperative region and power decisions. • Reduces action space complexity using modular base-station agents under CTDE. • Demonstrates superior energy efficiency and SIC success compared to fixed or gap-based baselines and outperforms single-agent in convergence. Achieving energy-efficient transmission with high decoding reliability is a fundamental challenge for massive machine-type communication (mMTC) using grant-free Non-Orthogonal Multiple Access (NOMA), due to dense device activity, sporadic traffic, and strong uplink interference. This paper introduces a Centralized Dual-Agent Deep Reinforcement Learning (CDA-DRL) framework that jointly optimizes region partitioning and transmit power pool design in uplink grant-free NOMA. Two cooperative agents at the base station independently learn the number of spatial regions and the number of power levels, respectively, using recurrent Deep Q-Networks under a centralized training and decentralized execution paradigm. This factorized architecture reduces the action-space complexity and enables scalable learning. Simulation results demonstrate that CDA-DRL achieves more stable training, higher Successive Interference Cancellation (SIC) decoding success, and significantly improved energy efficiency, outperforming geometric gap–based baselines by up to 114% and fixed-power schemes by 33%.
Ravi et al. (Sun,) studied this question.