What question did this study set out to answer?

To optimize spectral and energy efficiency in 6G MIMO-MC-CDMA systems using deep reinforcement learning.

May 7, 2026Open Access

Transformer-assisted hierarchical deep reinforcement learning for energy and spectrum efficient MIMO-MC-CDMA in 6G networks

Key Points

To optimize spectral and energy efficiency in 6G MIMO-MC-CDMA systems using deep reinforcement learning.
Proposed TH-DRL framework integrates a lightweight Transformer encoder and hierarchical decision-making architecture.
Utilized two-tier DRL model for subcarrier allocation and power control management.
Evaluated on Rayleigh channels with bandwidths of 400 MHz–1 GHz and various user scenarios.
Achieved 15-18% higher spectral efficiency compared to baseline methods.
Provided up to 22% energy savings in resource management.
Demonstrated stable convergence around episode 600 with peak performance metrics.

Abstract

TH-DRL is proposed for optimizing spectral and energy efficiency in 6G MIMO-MC-CDMA systems. Unlike existing flat DRL or optimization-based approaches, the proposed TH-DRL framework uniquely integrates a lightweight Transformer encoder with a hierarchical decision-making architecture to jointly optimize subcarrier allocation, power control, and SIC ordering. Architecturally, the structure integrates a Transformer encoder with a two-tier hierarchical DRL model to enhance adaptability in dynamic wireless conditions. The Transformer learns spatiotemporal dependencies from channel state information, interference patterns, and user dynamics to generate context-aware features that is used for making effective decisions. A high-level policy-gradient agent handles subcarrier allocation and user clustering, while a low-level DQN agent manages power control and successive interference cancellation order, jointly improving throughput and energy efficiency. Convergence, scalability, and detection performance are evaluated based on Rayleigh channels at 28/100 GHz with bandwidths of 400 MHz–1 GHz serving a varying number of users (10–100) served by a base station equipped with 64–256 antennas. Training consists of a replay buffer of samples within a range of 104–10⁶ over 5000–10,000 episodes. The results showed that the convergence is stable around episode 600 with consistent gain over the baseline methods achieving 15–18% higher spectral efficiency, up to 22% energy savings, and peak performance at SE = 32.7 bits/s/Hz, EE = 14.8 bits/J, SINR ≈ 34 dB, and BER ≈ 10⁻5. This confirms that Transformer-enhanced hierarchical DRL offers scalable, low-latency, energy-aware resource management for dense 6G networks.

Read Full Paperexternally

Demander à l'IA

Bookmark

View Full Paper