Deep Reinforcement Learning (DRL) models often struggle with generalization and robustness, requiring costly retraining to adapt to environmental changes. To address this, the study proposes Test Time Augmentation (TTA) as a post-training method to enhance the policy stability of DRL agents. This work introduces a novel approach that applies TTA to DRL by leveraging controlled state perturbation, majority voting, and dynamic scaling of augmentations. This method allows agents to adapt to varying conditions without modifying the original model parameters, offering a lightweight yet effective solution to improving robustness. Experimental results on the LunarLander-v2 environment using Deep Q-Networks (DQN) demonstrate a 4.78% performance improvement and a 9% success rate improvement under stable conditions and increased resilience against moderate noise. However, performance declines in highly chaotic environments, highlighting TTA’s limitations under extreme randomness. Overall, this study bridges the gap between TTA in computer vision and DRL, offering insights into practical and computationally efficient methods for improving policy robustness without retraining.
C. M. Dai (Wed,) studied this question.