May 10, 2024Open Access

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

Key Points

Achieving optimal sample complexity of $ ilde{ ext{O}}( ext{epsilon}^{-2-d/( u+1)})$, addressing both smooth and Lipschitz MDPs.
Key evidence includes recovery of state-of-the-art results for Lipschitz MDPs with $ u=0$, while using orthogonal trigonometric polynomials.
The approach utilizes a perturbed least-squares value iteration with techniques derived from harmonic analysis for projections and convergence issues in MDPs. The findings may suggest new avenues for research in reinforcement learning and continuous decision-making, addressing previously conflicting perspectives.

Abstract

We consider the problem of learning an -optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, perturbed version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our~O (^-2-d/ (+1) ) sample complexity, where d is the dimension of the state-action space and the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs (=0). At the same time, for, it recovers and greatly generalizes the O (^-2) rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Maran et al. (Fri,) studied this question.

synapsesocial.com/papers/68e6ac5ab6db64358762eb73 https://doi.org/https://doi.org/10.48550/arxiv.2405.06363

Bookmark

View Full Paper