February 28, 2024

Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States

Key Points

Key points are not available for this paper at this time.

Abstract

Markov decision processes are widely used for modeling sequential decision-making problems under uncertainty. We propose an online algorithm for solving a class of average-reward Markov decision processes with continuous state spaces in a model-free setting. The algorithm combines the classical relative Q-learning with an asynchronous averaging procedure, which permits the Q-value estimate at a state-action pair to be updated based on observations at other neighboring pairs sampled in subsequent iterations. These point estimates are then retained and used for constructing an interpolation-based function approximator that predicts the Q-function values at unexplored state-action pairs. We show that with probability one the sequence of function approximators converges to the optimal Q-function up to a constant. Numerical results on a simple benchmark example are reported to illustrate the algorithm.

Bookmark

Relative Q-Learning for Average-Reward Markov Decision Processes With Continuous States

Key Points

Abstract

Cite This Study