Key points are not available for this paper at this time.
Most existing studies consider the deep reinforcement learning (DRL) based Q-learning approach due to its ability to quickly converge to a near-optimal solution, resulting in effective allocation of resources and power. DRL-based Q-network discretizes the continuous power values which results in poor performance. It is challenging to allocate resources effectively in fast varying channel conditions in dynamic vehicular environments. In this work, we propose two approaches to overcome these challenges. First, we present a DRL-based energy-efficient resource allocation approach where we use a twin delayed deep deterministic policy gradient (TD3) scheme based on Thompson sampling to solve the power and resource allocation problem. Second, we present a dynamic meta-transfer learning framework to enhance the policy's ability to adjust to new channel conditions. Simulation results shows that the proposed TD3 approach based on Thompson sampling enhances the system performance. Moreover, the proposed DRL-based dynamic meta-transfer learning framework takes 80% less samples to adapt to a new environment.
Sohaib et al. (Tue,) studied this question.