Efficient Reinforcement Learning With the Novel N-Step Method and V-Network

Key Points

Key points are not available for this paper at this time.

Abstract

The application of reinforcement learning (RL) in artificial intelligence has become increasingly widespread. However, its drawbacks are also apparent, as it requires a large number of samples for support, making the enhancement of sample efficiency a research focus. To address this issue, we propose a novel N-step method. This method extends the horizon of the agent, enabling it to acquire more long-term effective information, thus resolving the issue of data inefficiency in RL. Additionally, this N-step method can reduce the estimation variance of Q-function, which is one of the factors contributing to estimation errors in Q-function estimation. Apart from high variance, estimation bias in Q-function estimation is another factor leading to estimation errors. To mitigate the estimation bias of Q-function, we design a regularization method based on the V-function, which has been underexplored. The combination of these two methods perfectly addresses the problems of low sample efficiency and inaccurate Q-function estimation in RL. Finally, extensive experiments conducted in discrete and continuous action spaces demonstrate that the proposed novel N-step method, when combined with classical deep Q-network, deep deterministic policy gradient, and TD3 algorithms, is effective, consistently outperforming the classical algorithms.

Bookmark

Efficient Reinforcement Learning With the Novel N-Step Method and V-Network

Key Points

Abstract

Cite This Study