ABSTRACT Reinforcement learning (RL) solutions have shown considerable promise for automating the defense of networks to cyber attacks. However, a limitation to their real world deployment is the sample efficiency and generalizability of RL agents. This means that even small changes to attack types require a new agent to be trained from scratch. Meta‐learning for RL aims to improve the sample efficiency of training agents by encoding pre‐training information that assists fast adaptation. This work focuses on two key meta‐learning approaches, MAML and ML3, representing differing approaches to encoding meta learning knowledge. Both approaches are limited to sets of environments that use the same action and observation space. To overcome this, we also present an extension to ML3, Gen ML3, that removes this requirement by training the learned loss on the reward information only. Experiments have been conducted on a distribution of network setups based on the PrimAITE environment. All approaches demonstrated improvements in sample efficiency against a PPO baseline for a range of automated cyber defense (ACD) tasks. We also show effective meta‐learning across network topologies with Gen ML3.
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrew Thomas
Nick Tillyer
Applied AI Letters
Building similarity graph...
Analyzing shared references across papers
Loading...
Thomas et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69401d622d562116f28f8e2a — DOI: https://doi.org/10.1002/ail2.70009