May 8, 2024Open Access

Off-Policy Asymptotic and Adaptive Maximum Entropy Deep Reinforcement Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Maximum entropy deep reinforcement learning has shown great promise in tackling various challenging continuous tasks. By incorporating the maximum entropy framework, the goal is to introduce more randomness in action selection and improve the training process. However, there exists a tradeoff between efficiency and stability, especially when dealing with large-scale tasks with high state and action dimensions.In certain situations, it becomes necessary to constrain the temperature hyperparameter of the maximum entropy term to prevent instability, which can hinder convergence. In this study, we propose an algorithm that combines adaptive and asymptotic maximum entropy with actor-critic random policies.Specifically, we introduce a state-dependent adaptive temperature to accelerate the training process and include an additional term involving asymptotic maximum entropy to ensure stable convergence. These components are combined with the selected critic value to serve as the target Q-value and the surrogate objective in the policy evaluation and improvement steps.The adaptive and asymptotic maximum entropy algorithm demonstrates robust adaptation to the efficiency-stability tradeoff, providing increased exploration and flexibility to address saddle point problems. We evaluate our method on various Gym tasks, and the results indicate that our proposed algorithms outperform several baselines in the domain of continuous control.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper