May 7, 2024Open Access

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general L-layer neural network. New proof techniques are developed and an improved new O (^-1) sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an O (^-1) complexity under the Markovian sampling, as opposed to the best known O (^-2) complexity in the existing literature.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Ke et al. (Tue,) studied this question.

synapsesocial.com/papers/68e6b4c2b6db6435876358e2 https://doi.org/https://doi.org/10.48550/arxiv.2405.04017

Me gusta

Guardar

Ver artículo completo