Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

Key Points

Key points are not available for this paper at this time.

Abstract

Invalid action masking is a practical technique in deep reinforcement learning to prevent agents from taking invalid actions. Existing approaches rely on action masking during policy training and utilization. This study focuses on developing reinforcement learning algorithms that incorporate action masking during training but can be used without action masking during policy execution. The study begins by conducting a theoretical analysis to elucidate the distinction between naive policy gradient and invalid action policy gradient. Based on this analysis, we demonstrate that the naive policy gradient is a valid gradient and is equivalent to the proposed composite objective algorithm, which optimizes both the masked policy and the original policy in parallel. Moreover, we propose an off-policy algorithm for invalid action masking that employs the masked policy for sampling while optimizing the original policy. To compare the effectiveness of these algorithms, experiments are conducted using a simplified real-time strategy (RTS) game simulator called Gym-μRTS. Based on empirical findings, we recommend utilizing the off-policy algorithm for addressing most tasks while employing the composite objective algorithm for handling more complex tasks.

Bookmark

View Full Paper

Bookmark

View Full Paper

Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

Key Points

Abstract

Cite This Study

Also Consider

Also Consider