What question did this study set out to answer?

This research aims to develop a Gauss-Newton method for efficient Q-learning using nonlinear approximations.

February 28, 2026

Gauss–Newton Temporal Difference Learning With Nonlinear Function Approximation

Key Points

This research aims to develop a Gauss-Newton method for efficient Q-learning using nonlinear approximations.
Proposed Gauss-Newton temporal difference (GNTD) learning method
Utilized mean-squared Bellman error optimization
Implemented target networks to prevent double sampling
Analyzed inexact GN steps for efficient updates
Tested across multiple reinforcement learning benchmarks.
Achieved improved sample complexity of O(ε^-1) for neural networks with ReLU activation
Established O(ε^-1.5) sample complexity for general smooth approximations
Demonstrated higher rewards and faster convergence compared to TD-type methods.

Abstract

In this article, a Gauss-Newton temporal difference (GNTD) learning method is proposed to solve the Q -learning problem with nonlinear function approximations. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of mean-squared Bellman error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, nonasymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with ReLU activation, GNTD achieves an improved sample complexity of {O} (^-1), as opposed to the {O} (^-2) sample complexity of the existing neural temporal difference (TD) methods. An {O} (^-1. 5) sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments on several reinforcement learning (RL) benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.

Bookmark

Gauss–Newton Temporal Difference Learning With Nonlinear Function Approximation

Key Points

Abstract

Cite This Study