Malware continues to be one of the top cybersecurity threats globally, ranking among the most critical threats in North America and Europe. Its rapid spread and increas- ing sophistication make accurate detection a top priority for organizations seeking to protect their infrastructure and sensitive data. Convolutional Neural Networks (CNNs), known for their strength in visual pattern recognition, have proven effective in detecting malware by converting malware files into images and leveraging their image-processing capabilities. However, one major challenge in applying CNNs to malware detection is the presence of imbalanced data, where certain malware classes are underrepresented. This study focuses on evaluating the impact of various imbalance handling techniques on CNN performance in the context of malware classification. Experimental results demonstrate that effective malware classification depends on balancing CNN learning with resampling. ADASYN sharpens decision boundaries, while ROS+RUS risk overfitting, requiring discriminative feature learning.
Zhu et al. (Mon,) studied this question.