What question did this study set out to answer?

The review aims to consolidate knowledge on data science and machine learning applications in cyber intrusion detection.

March 25, 2026Open Access

Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review

Key Points

The review aims to consolidate knowledge on data science and machine learning applications in cyber intrusion detection.
Systematic literature review of 153 studies from 2009 to 2025
Categorization of data science and machine learning techniques
Quantitative meta-analysis of benchmark datasets and algorithm usage
Identification of research gaps and emerging trends
UNSW-NB15 and CIC-IDS2017 datasets account for 71% usage
Deep learning algorithms constitute 40% of approaches
Only 34% of studies provide recall metrics for minority attack classes
Nine key research gaps identified
Eight emerging trends in intrusion detection technologies proposed

Abstract

The escalating sophistication and volume of cyberattacks have driven an urgent demand for intelligent Intrusion Detection Systems (IDS) that leverage Data Science (DS) and Machine Learning (ML). Despite rapid advances, existing reviews often focus narrowly on specific aspects without integrating the full data science and machine learning lifecycle. This paper presents a systematic review of DS and ML applications in cyber intrusion detection, covering 153 studies published from 2009 to 2025. The review systematically surveys benchmark datasets, data preprocessing and feature engineering techniques, classical ML and Deep Learning (DL) models, ensemble and hybrid strategies, class imbalance handling, and evaluation methodologies. A unified four-axis taxonomy is proposed to classify the literature, including learning strategy, imbalance handling, explainability level, and deployment context. A quantitative meta-analysis reveals that UNSW-NB15 and CIC-IDS2017 dominate at 71% combined dataset usage, deep learning represents 40% of algorithmic approaches, and only 34% of studies report per-class recall for minority attack types. Nine technically grounded research gaps are identified, spanning preprocessing standardization, cross-dataset evaluation, minority-class recall optimization, adversarial robustness, online and edge deployment, explainability for Security Operations Center (SOC) operations, federated learning, transformer and Large Language Model (LLMs) application, and zero-shot adaptation. The review further identifies eight emerging trends including attention-based and transformer architectures, LLMs, Graph Neural Networks (GNNs), federated and privacy-preserving learning, adversarial robustness, Explainable AI (XAI), zero-shot and few-shot detection, and Internet of Things (IoT) edge-based IDS. A seven-stage actionable architecture is proposed that integrates adaptive preprocessing, contrastive feature learning, recall-aware ensemble detection, XAI decision support, continual learning, and federated aggregation. This review provides researchers and practitioners with a structured roadmap for advancing the next generation of intelligent cyber intrusion detection systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

Data Science and Machine Learning for Cyber Intrusion Detection: A Systematic Review

Key Points

Abstract

Cite This Study