What question did this study set out to answer?

The aim is to review current techniques for generating and labeling datasets for cyber attack detection systems using machine learning.

April 8, 2026Open Access

Generating Pattern‐Based Datasets for Cyber Attack Detection Using Machine‐Learning Techniques

Key Points

The aim is to review current techniques for generating and labeling datasets for cyber attack detection systems using machine learning.
Review of existing literature on attack pattern datasets.
Comparative analysis of various machine learning techniques for detection.
Definition of criteria for selecting relevant works to analyze.
Identification of heterogeneity in the criteria for research on attack pattern datasets.
Highlighting the need for updated datasets that encompass a wide range of attack patterns.
Recognition of new algorithms, particularly those related to deep learning, as beneficial for identifying attack patterns.

Abstract

ABSTRACT This work aims to review the state of the art in the design, generation, and labeling of attack pattern datasets for the training of detection systems based on machine learning. A comparative study of different proposals will be carried out to detect shortcomings and areas for improvement in this field, which will serve as a starting point for future work. To this end, search and quality criteria have been defined to select a suitable set of works to be reviewed. A detailed analysis of the publications under study reveals the heterogeneity of criteria when it comes to research on attack pattern datasets. The variety of datasets analyzed, as well as the disparity in the selection of metrics and classification algorithms, makes it very difficult to establish clear comparative criteria. Further research is needed in this area to have updated datasets that include a catalogue of attack patterns as broad as possible. It is also worth highlighting the convenience of using new techniques and algorithms for the identification of attack patterns, such as those related to Deep Learning. This article is categorized under: Technologies > Classification Technologies > Machine Learning Fundamental Concepts of Data and Knowledge > Information Repositories

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

García et al. (Mon,) studied this question.

synapsesocial.com/papers/69d5f03374eaea4b11a799e4 https://doi.org/https://doi.org/10.1002/widm.70081

Bookmark

View Full Paper