What question did this study set out to answer?

The research aims to optimize region partitioning and power pool design in mMTC using a dual-agent framework.

April 1, 2026Open Access

Centralized Dual-Agent DRL for Joint Region and Power Resource Optimization in mMTC

Key Points

The research aims to optimize region partitioning and power pool design in mMTC using a dual-agent framework.
Implemented a dual-agent deep reinforcement learning framework for optimization.
Utilized recurrent deep Q-networks for independent learning of regions and power levels.
Applied a centralized training and decentralized execution paradigm to reduce complexity.
Conducted simulations to compare performance against fixed and gap-based baselines.
Achieved up to 114% improvement over geometric gap-based baselines in energy efficiency.
Demonstrated 33% better performance than fixed-power schemes.
Showed superior training stability and SIC decoding success rates.

Abstract

• Jointly optimizes region partitioning and region-specific power pool design. • Proposes a dual-agent DRL framework for cooperative region and power decisions. • Reduces action space complexity using modular base-station agents under CTDE. • Demonstrates superior energy efficiency and SIC success compared to fixed or gap-based baselines and outperforms single-agent in convergence. Achieving energy-efficient transmission with high decoding reliability is a fundamental challenge for massive machine-type communication (mMTC) using grant-free Non-Orthogonal Multiple Access (NOMA), due to dense device activity, sporadic traffic, and strong uplink interference. This paper introduces a Centralized Dual-Agent Deep Reinforcement Learning (CDA-DRL) framework that jointly optimizes region partitioning and transmit power pool design in uplink grant-free NOMA. Two cooperative agents at the base station independently learn the number of spatial regions and the number of power levels, respectively, using recurrent Deep Q-Networks under a centralized training and decentralized execution paradigm. This factorized architecture reduces the action-space complexity and enables scalable learning. Simulation results demonstrate that CDA-DRL achieves more stable training, higher Successive Interference Cancellation (SIC) decoding success, and significantly improved energy efficiency, outperforming geometric gap–based baselines by up to 114% and fixed-power schemes by 33%.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Ravi et al. (Sun,) studied this question.

synapsesocial.com/papers/69cd7ac55652765b073a8366 https://doi.org/https://doi.org/10.1016/j.comnet.2026.112262

Bookmark

View Full Paper