Background/Objectives: Vaccine-preventable diseases remain a persistent public health challenge in regions characterized by structural vulnerabilities, including suboptimal vaccination coverage, socioeconomic deprivation, and limited access to healthcare. In structurally vulnerable regions, such as the South-West Romanian region, characterized by persistent vaccination gaps and recurrent outbreaks, these conditions generate a sustained public health burden that requires ongoing preventive risk management strategies. In such contexts, digital risk stratification tools may support preventive decision-making by enabling early identification of patients at increased risk of severe outcomes. This study applied machine learning techniques to routinely collected measles surveillance data from South-West Romania to identify severe disease cases and determine key predictors of severity, offering a pragmatic alternative to outbreak forecasting in a resource-constrained setting. Methods: An open epidemiological dataset of laboratory-confirmed measles cases reported by the Regional Center for Public Health Surveillance Craiova was analyzed. The dataset defined severe cases as those with pneumonia, thrombocytopenia, a hospital stay exceeding three days, or other documented complications requiring medical intervention. Random Forest (RF) and Logistic Regression (LR) classifiers were trained and compared using a 10-fold cross-validation framework across 200 resampling iterations. Model performance was assessed using accuracy, AUC-ROC, sensitivity, specificity, positive predictive value, and F1-score. Feature importance was quantified using permutation-based measures, and the highest-ranked predictors were further evaluated through chi-square tests of independence. Results: RF significantly outperformed LR in accuracy (0.84 vs. 0.82), AUC (0.87 vs. 0.80), specificity (0.87 vs. 0.84), positive predictive value (0.89 vs. 0.86), and F1-score (0.84 vs. 0.83), with p ≤ 0.001 for most metrics. Sensitivity was equivalent between models (approximately 0.81; p = 0.328). Feature importance analysis identified seven key predictors: county of residence, vaccination status, outbreak status, presence of other symptoms, occupation, cough, and conjunctivitis. All seven were significantly associated with disease severity, and six showed significant geographic variation across counties. Vâlcea County had the highest concentration of severe cases. The model was trained on a regional surveillance cohort in which symptomatic and hospitalized cases are over-represented and should be interpreted as a triage-support tool within this surveillance context rather than as a population-level severity estimator. Conclusions: Machine learning, particularly RF, can effectively identify severe measles cases using routinely collected surveillance data in settings where robust outbreak prediction is not feasible. The county of residence functioned as a composite proxy for structural determinants, including healthcare access, vaccination coverage, and socioeconomic deprivation. These findings support the use of ML-based severity classification as a pragmatic tool for clinical risk stratification and targeted public health intervention in resource-constrained environments.
Baiașu et al. (Thu,) studied this question.