What question did this study set out to answer?

The study aims to evaluate the robustness of various models in network intrusion detection against adversarial attacks.

April 19, 2026Open Access

Toward Adversarial Robustness Network Intrusion Detection Based on Multi-Model Ensemble Approach

Key Points

The study aims to evaluate the robustness of various models in network intrusion detection against adversarial attacks.
Compared four models: XGBoost, LightGBM, TabNet, and Residual MLP.
Tested on two datasets: RT_IOT2022 and Web_IDS23.
Conducted standard attacks and adaptive evaluations, including sample-fraction sensitivity and computational overhead measurements.
Performed per-class F1 analysis and repeated-run significance tests.
Tree-based models reduced the robustness gap under strong attacks but often at the cost of clean accuracy.
Residual MLP provided a better balance between robustness and accuracy.
Simplicity in models often outperformed the full defense stack, especially in clean/attack accuracy.
Median filtering showed fragility, degrading performance with larger filter windows.

Abstract

Machine learning-based network intrusion detection systems (NIDSs) remain vulnerable to adversarial manipulation, but the robustness literature for tabular NIDS data is still dominated by single-model, single-dataset, and non-adaptive evaluations. In this paper, we reposition the manuscript as a comparative robustness study of a four-component defense pipeline rather than as a claim of a universal defense primitive. We evaluate XGBoost, LightGBM, TabNet, and Residual MLP on RTIOT2022 and WebIDS23 under standard attacks, representative constrained/adaptive attacks, component-wise ablations, sample-fraction sensitivity, repeated-run significance tests, per-class F1 analysis, and computational-overhead measurements. The results show strong dataset and architecture dependence. On RTIOT2022, tree-based models close most of the robustness gap under strong attacks but often only after large clean-accuracy reductions; Residual MLP achieves a more favorable balance, while the full defense stack over-regularizes TabNet. On WebIDS23, aggregate robustness-gap reduction remains positive, yet simpler baselines such as adversarial-training-only or ensemble-only configurations frequently outperform the full four-stage pipeline in absolute clean/attack accuracy. Across both datasets, median filtering is the most fragile component: larger filter windows substantially degrade both clean and attacked accuracy, whereas contamination rate, anomaly-mixing weight, and ensemble size are comparatively stable. Representative constrained/adaptive evaluations reduce performance only modestly relative to standard FGSM/PGD, but per-class and overhead analyses show that minority-class collapse and training cost remain important deployment limitations. These findings support a more cautious conclusion: adversarial defense for tabular NIDS is validation driven and dataset specific, and the full defense stack should not be treated as a universal default.

Toward Adversarial Robustness Network Intrusion Detection Based on Multi-Model Ensemble Approach

Key Points

Abstract

Cite This Study