BigFlow-NIDS, a large-scale, NetFlow-based dataset for network intrusion detection research in big-data environments. BigFlow-NIDS contains 66,935,021 flows, 55 flow attributes, and 32 fine-grained attack categories, available in both CSV and Parquet formats to support scalable ML and streaming analyses. Compared with CSV, Parquet loading reduced read time dramatically (CSV: 920.82 s vs Parquet: 27.35 s) under the paper's Colab setup, demonstrating the importance of columnar storage for large NIDS corpora. The dataset contains 36.6 million benign flows and 30.3 million attack flows, indicating a noticeable class imbalance. We release BigFlow-NIDS and provide baseline exploratory analyses and anomaly-detection experiments to support the development and evaluation of scalable, temporally-aware intrusion detection systems.
Uddin et al. (Mon,) studied this question.