Underwater acoustic sensor networks (UASNs) have emerged as a pivotal technology for ocean exploration, tactical surveillance, and environmental monitoring. However, the underwater acoustic channel poses severe challenges, including high propagation delay, limited bandwidth, and rapid time-varying multipath fading, which significantly degrade communication reliability. Cooperative communication, which exploits spatial diversity via relay nodes, offers a promising solution to these impairments. In this paper, we investigate the joint optimization of relay selection and power allocation in UASNs to maximize the long-term system energy efficiency and throughput. This problem is inherently complex due to the hybrid action space, which couples the discrete selection of relay nodes with the continuous allocation of transmission power, and the absence of real-time, perfect channel state information (CSI). To address these challenges, we propose a novel deep hybrid reinforcement learning (DHRL) framework utilizing a parameterized deep Q-Network (P-DQN) architecture. Unlike traditional approaches that discretize power levels or relax discrete constraints, our approach seamlessly integrates a deterministic policy network for continuous power control and a value-based network for discrete relay evaluation. Furthermore, we incorporate a prioritized experience replay (PER) mechanism to improve sample efficiency by focusing on rare but significant channel transition events. We provide a comprehensive theoretical analysis of the algorithm’s complexity and convergence properties. Extensive simulation results demonstrate that the proposed DHRL algorithm outperforms state-of-the-art combinatorial bandit algorithms and conventional deep reinforcement learning baselines in terms of system energy efficiency, and also exhibits superior robustness against channel estimation errors.
Zeng et al. (Fri,) studied this question.