Abstract This study utilises publicly available datasets to identify network behaviour patterns in Virtual Network Function (VNF) and Physical Network Function (PNF) traffic. Building on previous research that created the VNFCYBERDATA dataset that focused on VNF traffic, this work compares the VNFCYBERDATA dataset with PNF-based datasets, including CIC-IDS2017 and CIC-Bell-DNS-EXF-2021, as well as another PNF dataset collected as part of this study. The objective is to examine the differences and similarities in traffic patterns across these environments. Additionally, this research proposes stacked deep learning models that integrate base algorithms, such as gated recurrent units, long short-term memory, and multilayer perceptrons, with meta-learning algorithms, including random forest, support vector machines, logistic regression, and decision trees. The goal is to determine whether a VNF-based dataset is essential for training a model to prevent VNF attacks or if a PNF-based dataset could be equally effective in a VNF environment (and vice versa). The results indicate that network traffic data from PNF might not be directly transferred to a VNF environment without recalibration. Specifically, results show that VNF–PNF differences are most significant and most consistent in timing-derived features (IAT/jitter and derivatives), while PNF–PNF is generally more similar overall but can still diverge in some aggregate timing features. The findings also showed that domain-specific training and prediction are often the most effective approaches. However, cross-domain traffic categorisation is still possible.
Ayodele et al. (Thu,) studied this question.