The proliferation of sophisticated cyber threats necessitates intelligent intrusion detection systems (IDS) capable of accurately distinguishing between normal and anomalous network traffic. This study presents a comparative evaluation of three optimized machine learning classifiers, Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM), for binary network intrusion detection. The methodology integrates Recursive Feature Elimination (RFE) for dimensionality reduction, selecting 10 discriminative features, and Bayesian hyperparameter optimization using the Optuna framework. Models were trained and evaluated on the KDD Cup 1999 dataset with a 70/30 stratified split. Results demonstrate that RF achieved the highest performance (accuracy: 99.62%, F1-score: 99.59%), followed by KNN (accuracy: 98.25%, F1-score: 98.11%) and SVM (accuracy: 95.82%, F1-score: 95.46%). Ten-fold stratified cross-validation confirmed the robustness of these findings (RF: 99.64% ± 0.12%, KNN: 98.41% ± 0.16%, SVM: 96.10% ± 0.30%). McNemar’s test established that all pairwise performance differences are statistically significant (p < 0.001). Computational cost analysis revealed that RF offers the best accuracy-efficiency balance, with training in 0.098s and prediction in 0.0025s. A Streamlit-based web application was developed for real-time inference. The findings underscore the efficacy of combining automated feature selection with Bayesian optimization for high-performance IDS.
Makinde et al. (Thu,) studied this question.