What question did this study set out to answer?

This study aims to evaluate the performance of three machine learning classifiers for network intrusion detection.

May 9, 2026

A Comparative Evaluation of Optimized Machine Learning Classifiers for Network Intrusion Detection: Random Forest, K-Nearest Neighbors, and Support Vector Machine Approaches

Key Points

This study aims to evaluate the performance of three machine learning classifiers for network intrusion detection.
Used Random Forest, K-Nearest Neighbors, and Support Vector Machine classifiers.
Implemented Recursive Feature Elimination for selecting features and Bayesian optimization with Optuna.
Trained models on KDD Cup 1999 dataset with a 70/30 stratified split.
Random Forest achieved the highest accuracy of 99.62% and F1-score of 99.59%.
K-Nearest Neighbors had an accuracy of 98.25% and F1-score of 98.11%.
Support Vector Machine showed an accuracy of 95.82% and F1-score of 95.46%.

Abstract

The proliferation of sophisticated cyber threats necessitates intelligent intrusion detection systems (IDS) capable of accurately distinguishing between normal and anomalous network traffic. This study presents a comparative evaluation of three optimized machine learning classifiers, Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM), for binary network intrusion detection. The methodology integrates Recursive Feature Elimination (RFE) for dimensionality reduction, selecting 10 discriminative features, and Bayesian hyperparameter optimization using the Optuna framework. Models were trained and evaluated on the KDD Cup 1999 dataset with a 70/30 stratified split. Results demonstrate that RF achieved the highest performance (accuracy: 99.62%, F1-score: 99.59%), followed by KNN (accuracy: 98.25%, F1-score: 98.11%) and SVM (accuracy: 95.82%, F1-score: 95.46%). Ten-fold stratified cross-validation confirmed the robustness of these findings (RF: 99.64% ± 0.12%, KNN: 98.41% ± 0.16%, SVM: 96.10% ± 0.30%). McNemar’s test established that all pairwise performance differences are statistically significant (p < 0.001). Computational cost analysis revealed that RF offers the best accuracy-efficiency balance, with training in 0.098s and prediction in 0.0025s. A Streamlit-based web application was developed for real-time inference. The findings underscore the efficacy of combining automated feature selection with Bayesian optimization for high-performance IDS.

KI fragen

Bookmark

View Full Paper