The escalating sophistication and volume of cyberattacks have driven an urgent demand for intelligent Intrusion Detection Systems (IDS) that leverage Data Science (DS) and Machine Learning (ML). Despite rapid advances, existing reviews often focus narrowly on specific aspects without integrating the full data science and machine learning lifecycle. This paper presents a systematic review of DS and ML applications in cyber intrusion detection, covering 153 studies published from 2009 to 2025. The review systematically surveys benchmark datasets, data preprocessing and feature engineering techniques, classical ML and Deep Learning (DL) models, ensemble and hybrid strategies, class imbalance handling, and evaluation methodologies. A unified four-axis taxonomy is proposed to classify the literature, including learning strategy, imbalance handling, explainability level, and deployment context. A quantitative meta-analysis reveals that UNSW-NB15 and CIC-IDS2017 dominate at 71% combined dataset usage, deep learning represents 40% of algorithmic approaches, and only 34% of studies report per-class recall for minority attack types. Nine technically grounded research gaps are identified, spanning preprocessing standardization, cross-dataset evaluation, minority-class recall optimization, adversarial robustness, online and edge deployment, explainability for Security Operations Center (SOC) operations, federated learning, transformer and Large Language Model (LLMs) application, and zero-shot adaptation. The review further identifies eight emerging trends including attention-based and transformer architectures, LLMs, Graph Neural Networks (GNNs), federated and privacy-preserving learning, adversarial robustness, Explainable AI (XAI), zero-shot and few-shot detection, and Internet of Things (IoT) edge-based IDS. A seven-stage actionable architecture is proposed that integrates adaptive preprocessing, contrastive feature learning, recall-aware ensemble detection, XAI decision support, continual learning, and federated aggregation. This review provides researchers and practitioners with a structured roadmap for advancing the next generation of intelligent cyber intrusion detection systems.
Yali Ren (Wed,) studied this question.