What question did this study set out to answer?

This research aims to address the deficiency of realistic malware traffic datasets for ML-based NIDS.

February 16, 2026Open Access

Bridging the Data Gap in ML-Based NIDS: An Automated Honeynet Platform for Generating Real-World Malware Traffic Datasets

Key Points

This research aims to address the deficiency of realistic malware traffic datasets for ML-based NIDS.
Developed an automated platform that generates malware traffic datasets.
Utilized a production-environment honeynet (T-Pot) in a university network for data capture.
Deployed high-interaction honeypots including Dionaea and Cowrie.
Implemented filtering based on honeypot logs and malware analysis tools like VirusTotal.
Successfully captured and filtered live attack traffic.
Produced the IPN-UAN-23 dataset, a curated collection of malicious network traffic.
Provided continuous actionable intelligence for developing robust ML-based NIDS.

Abstract

The effectiveness of Machine Learning (ML)-based Network Intrusion Detection Systems (NIDS) is critically hampered by the scarcity of realistic and up-to-date malware traffic datasets. To address this gap, we present an automated platform for generating real-world malware traffic datasets. Our solution leverages a production-environment honeynet (T-Pot), deployed within a university network and segmented via a secure WireGuard VPN, to capture live attacks using high-interaction honeypots (Dionaea, Cowrie, ADBhoney). A fully automated pipeline handles traffic capture, transfer, filtering based on honeypot logs, and malware analysis (VirusTotal, VxAPI). The output is the IPN-UAN-23 dataset—a curated, labeled corpus of malicious network traffic. This platform functions as a vital automated security tool, providing the continuous stream of actionable intelligence required to develop and refine robust ML-based NIDS within a DevSecOps lifecycle.

Bridging the Data Gap in ML-Based NIDS: An Automated Honeynet Platform for Generating Real-World Malware Traffic Datasets

Key Points

Abstract

Cite This Study

Also Consider

Also Consider