ABSTRACT This work aims to review the state of the art in the design, generation, and labeling of attack pattern datasets for the training of detection systems based on machine learning. A comparative study of different proposals will be carried out to detect shortcomings and areas for improvement in this field, which will serve as a starting point for future work. To this end, search and quality criteria have been defined to select a suitable set of works to be reviewed. A detailed analysis of the publications under study reveals the heterogeneity of criteria when it comes to research on attack pattern datasets. The variety of datasets analyzed, as well as the disparity in the selection of metrics and classification algorithms, makes it very difficult to establish clear comparative criteria. Further research is needed in this area to have updated datasets that include a catalogue of attack patterns as broad as possible. It is also worth highlighting the convenience of using new techniques and algorithms for the identification of attack patterns, such as those related to Deep Learning. This article is categorized under: Technologies > Classification Technologies > Machine Learning Fundamental Concepts of Data and Knowledge > Information Repositories
García et al. (Mon,) studied this question.