What type of study is this?

September 10, 2025Open Access

Multi‐Objective Reinforcement Learning for Automated Resilient Cyber Defence

Key Points

Multi-objective reinforcement learning improves network defense by considering multiple factors simultaneously.
Performance comparison reveals that MORL agents outperform single-objective approaches in managing network defense complexities.
The approach involves using two algorithms, MOPPO and PCN, to create agents capable of balancing conflicting objectives.
This research emphasizes the need for adaptive models that can address the dynamic nature of cybersecurity threats.

Abstract

ABSTRACT Cyber‐attacks pose a security threat to military command and control networks, Intelligence, Surveillance, and Reconnaissance (ISR) systems, and civilian critical national infrastructure. The use of artificial intelligence and autonomous agents in these attacks increases the scale, range, and complexity of this threat and the subsequent disruption they cause. Autonomous Cyber Defence (ACD) agents aimto mitigate this threat by responding at machine speed and at the scale required to address the problem. Additionally, they reduce the burden on the limited number of human cyber experts available to respond to an attack. Sequential decision‐making algorithms such as Deep Reinforcement Learning (RL) provide a promising route to create ACD agents. These algorithms focus on a single objectivesuch as minimising the intrusion of red agents on the network, by using a handcrafted weighted sum of rewards. This approach removes the ability to adapt the model during inference, and fails to address the many competing objectivespresent when operating and protecting these networks. Conflicting objectives, such as restoring a machine from a back‐up image, must be carefully balanced with the cost of associated down‐time or the disruption to network traffic or services that might result. Instead of pursuing a Single‐Objective RL (SORL) approach, here we present a simple example of a multi‐objective network defense game that requires consideration of both defending the network against red‐agents and maintaining the critical functionality of green‐agents. Two Multi‐Objective Reinforcement Learning (MORL) algorithms, namely Multi‐Objective Proximal Policy Optimization (MOPPO) and Pareto‐Conditioned Networks (PCN), are used to create two trained ACD agents whose performance is compared on our Multi‐Objective Cyber Defense game. The benefits and limitations of MORL ACD agents in comparison to SORL ACD agents are discussed based on the investigations of this game.

Multi‐Objective Reinforcement Learning for Automated Resilient Cyber Defence

Key Points

Abstract

Cite This Study

Also Consider

Also Consider