Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Key Points

Key points are not available for this paper at this time.

Abstract

We investigate the decentralized nonparametric policy evaluation problem within reinforcement learning (RL), focusing on scenarios where multiple agents collaborate to learn the state-value function using sampled state transitions and privately observed rewards. Our approach centers on a regression-based multistage iteration technique employing infinite-dimensional gradient descent (GD) within a reproducing kernel Hilbert space (RKHS). To make computation and communication more feasible, we employ Nyström approximation to project this space into a finite-dimensional one. We establish statistical error bounds to describe the convergence of value function estimation, marking the first instance of such analysis within a fully decentralized nonparametric framework. We compare the regression-based method to the kernel temporal difference (TD) method in some numerical studies.

Mark Helpful

Bookmark

Relay

Mark Helpful

Bookmark

Relay

Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Key Points

Abstract

Cite This Study