June 6, 2019Open Access

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are to be brittle toward hyperparameters as well as. Soft Actor Critic (SAC) proposes an off-policy deep actor critic within the maximum entropy RL framework which offers greater and empirical gains. The choice of policy distribution, a factored, is motivated by dueits easy re-parametrization rather its modeling power. We introduce Normalizing Flow policies within the SAC that learn more expressive classes of policies than simple factored. also present a series of stabilization tricks that enable training of these policies in the RL setting. We show empirically on grid world tasks that our approach increases stability and is better to difficult exploration in sparse reward settings.

Me gusta

Guardar

Ver artículo completo

Cite This Study

Ward et al. (Thu,) studied this question.

synapsesocial.com/papers/6a17f17c56b3e2ada412ce1a https://doi.org/https://doi.org/10.48550/arxiv.1906.02771

Me gusta

Guardar

Ver artículo completo