March 31, 2024Open Access

Reinforcement Learning Maximized-Actor-Critic(MAC) Method Based on Policy-Gradient

Key Points

Key points are not available for this paper at this time.

Abstract

Reinforcement learning has garnered significant attention in contemporary times due to its applic- ability across various fields. It is being applied experimentally and commercially in diverse fields, in- cluding games, robotics, and autonomous systems. These forms of reinforcement learning are broadly categorized as Value-Based, Policy-Based, and Actor-Critic methods. Value-Based assigns a value to every action to select the most valuable action in a given state, while Policy-Based assigns a proba- bility to every action to select an action in a given state. In the case of Value-Based, the computation increases as the learning environment grows larger, making it impossible to run environments that re- quire continuous behavior, and performance degrades with larger environments. On the other hand, in the case of Policy-Based methods, the model can be unstable since it is trained with a policy that prob- abilistically selects actions. The Actor-Critic method, designed to tackle these issues, still faces learning instability due to its shared behavioral selection approach with Policy-Based methods. In this paper, we propose the Maximized Actor-Critic method, which maximizes the action choices of actors based on the Actor-Critic approach. Since the existing Actor-Critic method generates behaviors from a single actor, its goal is to converge on high expectation values and stable reward values based on the actor with the maximum score. To evaluate the performance improvement of this algorithm, we added it to the existing Actor-Critic algorithm and performed a performance comparison analysis. We verified that the proposed algorithm achieves the stability and high expectation values observed in previous Actor- Critic-based algorithms.

Reinforcement Learning Maximized-Actor-Critic(MAC) Method Based on Policy-Gradient

Key Points

Abstract

Cite This Study

Also Consider

Also Consider