What question did this study set out to answer?

The study aims to explore the limitations of Q-learning in countably infinite state spaces and its failure to converge.

April 3, 2026

When Q-Learning fails: unstable behavior for infinite state spaces

Key Points

The study aims to explore the limitations of Q-learning in countably infinite state spaces and its failure to converge.
Introduced a queueing model based on a load balancing problem with infinite states.
Analyzed the effects of two actions, 'red' and 'green', on stability and transient behavior.
Conducted numerical experiments with varying step sizes satisfying Robbins-Monro conditions.
Q-learning exhibited instability under certain parameter conditions in infinite state spaces.
The 'red' action led to transient behavior, while the 'green' action ensured stability.
Numerical experiments confirmed that instability occurs even with acceptable step size conditions.

Abstract

The Q-learning algorithm is well known for its convergence guarantees to the optimal policy in finite-state environments. In this paper, we investigate its limitations in countable infinite state spaces – a setting common in real-world problems. To this end, we introduce a simple queueing model, based on a load balancing problem, with a countably infinite state space. In this model, a dispatcher assigns incoming jobs to one of two queues by choosing between two possible actions: ''red'' and ''green''. The ''red'' action leads to transient behavior, whereas the ''green'' action ensures stability. Our main result shows that, under certain parameter conditions, Q-learning exhibits instability and fails to converge to the optimal policy. Our findings reveal a critical gap in the theoretical understanding of model-free Reinforcement Learning methods in infinite domains. Numerical experiments illustrate that the transience also occurs with decreasing stepsizes that satisfy the usual Robbins-Monro conditions.

Mark Helpful

Bookmark

Relay