What question did this study set out to answer?

The aim is to optimize relay selection and power allocation to enhance energy efficiency and throughput in underwater acoustic networks.

March 29, 2026Open Access

A Deep Reinforcement Learning Approach for Joint Resource Allocation in Time-Varying Underwater Acoustic Cooperative Networks

Key Points

The aim is to optimize relay selection and power allocation to enhance energy efficiency and throughput in underwater acoustic networks.
Utilized a deep hybrid reinforcement learning framework with a parameterized deep Q-Network architecture.
Integrated deterministic policy and value-based networks for power control and relay evaluation.
Incorporated a prioritized experience replay mechanism to enhance sample efficiency.
Analyzed the algorithm’s complexity and convergence properties theoretically.
The DHRL algorithm surpassed combinatorial bandit algorithms and conventional deep reinforcement learning in energy efficiency.
Demonstrated increased robustness against channel estimation errors in simulations.

Abstract

Underwater acoustic sensor networks (UASNs) have emerged as a pivotal technology for ocean exploration, tactical surveillance, and environmental monitoring. However, the underwater acoustic channel poses severe challenges, including high propagation delay, limited bandwidth, and rapid time-varying multipath fading, which significantly degrade communication reliability. Cooperative communication, which exploits spatial diversity via relay nodes, offers a promising solution to these impairments. In this paper, we investigate the joint optimization of relay selection and power allocation in UASNs to maximize the long-term system energy efficiency and throughput. This problem is inherently complex due to the hybrid action space, which couples the discrete selection of relay nodes with the continuous allocation of transmission power, and the absence of real-time, perfect channel state information (CSI). To address these challenges, we propose a novel deep hybrid reinforcement learning (DHRL) framework utilizing a parameterized deep Q-Network (P-DQN) architecture. Unlike traditional approaches that discretize power levels or relax discrete constraints, our approach seamlessly integrates a deterministic policy network for continuous power control and a value-based network for discrete relay evaluation. Furthermore, we incorporate a prioritized experience replay (PER) mechanism to improve sample efficiency by focusing on rare but significant channel transition events. We provide a comprehensive theoretical analysis of the algorithm’s complexity and convergence properties. Extensive simulation results demonstrate that the proposed DHRL algorithm outperforms state-of-the-art combinatorial bandit algorithms and conventional deep reinforcement learning baselines in terms of system energy efficiency, and also exhibits superior robustness against channel estimation errors.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zeng et al. (Fri,) studied this question.

synapsesocial.com/papers/69c8c371de0f0f753b39e462 https://doi.org/https://doi.org/10.3390/jmse14070616

Bookmark

View Full Paper