January 1, 1999

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.

Preguntar a la IA

Me gusta

Guardar

Cite This Study

Konda et al. (Fri,) studied this question.

synapsesocial.com/papers/6a155b12d73ae7522a4e2a9a https://doi.org/https://doi.org/10.1137/s036301299731669x

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

1The actor-critic algorithm as multi-time-scale stochastic approximation1997 · 30 citations
2Stability and convergence of stochastic approximation using the ODE method2002 · 4 citations
3Learning algorithms for Markov decision processes1987 · 27 citations
4Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations1990 · 272 citations
5Smoothing derivatives of functions and applications1969 · 183 citations

Preguntar a la IA

Me gusta

Guardar