TH-DRL is proposed for optimizing spectral and energy efficiency in 6G MIMO-MC-CDMA systems. Unlike existing flat DRL or optimization-based approaches, the proposed TH-DRL framework uniquely integrates a lightweight Transformer encoder with a hierarchical decision-making architecture to jointly optimize subcarrier allocation, power control, and SIC ordering. Architecturally, the structure integrates a Transformer encoder with a two-tier hierarchical DRL model to enhance adaptability in dynamic wireless conditions. The Transformer learns spatiotemporal dependencies from channel state information, interference patterns, and user dynamics to generate context-aware features that is used for making effective decisions. A high-level policy-gradient agent handles subcarrier allocation and user clustering, while a low-level DQN agent manages power control and successive interference cancellation order, jointly improving throughput and energy efficiency. Convergence, scalability, and detection performance are evaluated based on Rayleigh channels at 28/100 GHz with bandwidths of 400 MHz–1 GHz serving a varying number of users (10–100) served by a base station equipped with 64–256 antennas. Training consists of a replay buffer of samples within a range of 104–10⁶ over 5000–10,000 episodes. The results showed that the convergence is stable around episode 600 with consistent gain over the baseline methods achieving 15–18% higher spectral efficiency, up to 22% energy savings, and peak performance at SE = 32.7 bits/s/Hz, EE = 14.8 bits/J, SINR ≈ 34 dB, and BER ≈ 10⁻5. This confirms that Transformer-enhanced hierarchical DRL offers scalable, low-latency, energy-aware resource management for dense 6G networks.
Saranya et al. (Tue,) studied this question.