Key points are not available for this paper at this time.
The application of reinforcement learning (RL) in artificial intelligence has become increasingly widespread. However, its drawbacks are also apparent, as it requires a large number of samples for support, making the enhancement of sample efficiency a research focus. To address this issue, we propose a novel N-step method. This method extends the horizon of the agent, enabling it to acquire more long-term effective information, thus resolving the issue of data inefficiency in RL. Additionally, this N-step method can reduce the estimation variance of Q-function, which is one of the factors contributing to estimation errors in Q-function estimation. Apart from high variance, estimation bias in Q-function estimation is another factor leading to estimation errors. To mitigate the estimation bias of Q-function, we design a regularization method based on the V-function, which has been underexplored. The combination of these two methods perfectly addresses the problems of low sample efficiency and inaccurate Q-function estimation in RL. Finally, extensive experiments conducted in discrete and continuous action spaces demonstrate that the proposed novel N-step method, when combined with classical deep Q-network, deep deterministic policy gradient, and TD3 algorithms, is effective, consistently outperforming the classical algorithms.
Zhang et al. (Tue,) studied this question.