What question did this study set out to answer?

The aim is to enhance automated cyber defense through improved sample efficiency in reinforcement learning agents.

December 9, 2025Open Access

Meta Reinforcement Learning for Automated Cyber Defence

Key Points

The aim is to enhance automated cyber defense through improved sample efficiency in reinforcement learning agents.
Utilized meta-learning approaches, MAML and ML3, for reinforcement learning agents.
Developed Gen ML3 to train agents without the same action and observation spaces.
Conducted experiments using the PrimAITE environment with diverse network setups.
Demonstrated improvements in sample efficiency compared to a PPO baseline.
Showed effective meta-learning across various network topologies with Gen ML3.

Abstract

ABSTRACT Reinforcement learning (RL) solutions have shown considerable promise for automating the defense of networks to cyber attacks. However, a limitation to their real world deployment is the sample efficiency and generalizability of RL agents. This means that even small changes to attack types require a new agent to be trained from scratch. Meta‐learning for RL aims to improve the sample efficiency of training agents by encoding pre‐training information that assists fast adaptation. This work focuses on two key meta‐learning approaches, MAML and ML3, representing differing approaches to encoding meta learning knowledge. Both approaches are limited to sets of environments that use the same action and observation space. To overcome this, we also present an extension to ML3, Gen ML3, that removes this requirement by training the learned loss on the reward information only. Experiments have been conducted on a distribution of network setups based on the PrimAITE environment. All approaches demonstrated improvements in sample efficiency against a PPO baseline for a range of automated cyber defense (ACD) tasks. We also show effective meta‐learning across network topologies with Gen ML3.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper