Cybersecurity vulnerabilities represent a critical threat to information systems, often leading to data breaches and operational disruptions. Accurate assessment of vulnerability severity is therefore essential for effective risk prioritization. The Common Vulnerabilities and Exposures (CVE) system maintains a catalog of such vulnerabilities, each accompanied by a brief textual description and a severity score, typically assigned using the Common Vulnerability Scoring System (CVSS). However, manually assigning severity scores is time-consuming and resource-intensive. This challenge highlights the need for automated approaches capable of predicting severity directly from textual data. In this study, we explore the automatic prediction of CVE severity levels from textual descriptions using machine learning. To address class imbalance, we leverage GPT-Neo, a generative language model, to synthetically augment underrepresented categories. We then fine-tune a DeBERTa-based deep learning model for classification, achieving high accuracy in predicting severity levels from text alone. To enhance interpretability, we employ Local Interpretable Model-Agnostic Explanations (LIME) to identify key terms and phrases that most strongly influence model decisions. This approach demonstrates strong predictive performance and provides insight into the linguistic patterns associated with vulnerability severity.
Yasin et al. (Fri,) studied this question.