January 1, 1999

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Key Points

Key points are not available for this paper at this time.

Abstract

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.

AI에게 질문

Bookmark

Cite This Study

Konda et al. (Fri,) studied this question.

synapsesocial.com/papers/6a155b12d73ae7522a4e2a9a https://doi.org/https://doi.org/10.1137/s036301299731669x

AI에게 질문

Bookmark