What question did this study set out to answer?

May 6, 2026Open Access

A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset

Key Points

To compare supervised machine learning techniques for network anomaly detection using the UNSW-NB15 dataset.
Evaluated Decision Tree, Random Forest, Support Vector Machine, and XGBoost.
Employed a structured preprocessing pipeline and five-fold stratified cross-validation.
Assessed model performance using accuracy, precision, recall, F1-score, and area under the ROC curve.
XGBoost achieved the highest accuracy of 0.97 and AUC of 0.98.
Ensemble methods outperformed individual classifiers in anomaly detection.
Identified flow-based and temporal features such as sttl, sload, and dload as key in analysis.

Abstract

The increasing complexity of modern network infrastructures has intensified the need for reliable and efficient intrusion detection systems. While advanced deep learning approaches have demonstrated strong performance, their high computational cost and limited interpretability restrict their practical deployment in real-time environments. This study presents a systematic empirical evaluation of four supervised machine learning models—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—for network anomaly detection using the UNSW-NB15 dataset. To ensure methodological rigor, a structured preprocessing pipeline and a five-fold stratified cross-validation framework were employed. Model performance was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). In addition, a feature importance analysis was conducted to identify the most influential network traffic attributes contributing to anomaly detection. The results show that ensemble-based methods outperform individual classifiers, with XGBoost achieving the best overall performance (accuracy = 0.97, AUC = 0.98) along with high stability across validation folds. The analysis further reveals that a subset of flow-based and temporal features—such as sttl, sload, and dload—plays a critical role in distinguishing between normal and malicious traffic. This study provides a rigorous, interpretable, and reproducible benchmarking framework for supervised machine learning in network anomaly detection. The findings provide practical insights for developing efficient and scalable intrusion detection systems suitable for real-world deployment.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Nouf Alkhater (Fri,) studied this question.

synapsesocial.com/papers/69fa986a04f884e66b5323d2 https://doi.org/https://doi.org/10.3390/computers15050285

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper