Traffic accidents are among the leading causes of injury worldwide, highlighting the urgent need to better understand the factors that contribute to accident occurrence and severity in order to improve road safety and reduce injuries and fatalities. This study analyzes the U.S. Accidents dataset, comprising data collected from 2016 to 2023, to identify the key determinants of accident severity and to evaluate feature-selection techniques for predictive modeling. To this end, several feature-selection methods are examined, including L1-regularized logistic regression, minimum redundancy maximum relevance (mRMR), conditional mutual information maximization (CMIM), ReliefF, and tree-based importance measures; these are compared with the Weighted Conditional Mutual Information (WCFR). The selected feature subsets are then evaluated using three machine learning models: logistic regression, random forest, and XGBoost. Experimental results show that WCFR consistently outperforms the competing methods, achieving higher validation accuracy (up to approximately 0.84) and Macro-F1 scores (up to approximately 0.55), while using fewer features and maintaining model interpretability. These results indicate that WCFR is particularly effective for accident severity modeling and highlight its potential as a robust feature selection strategy for large-scale transportation safety analytics and future severity prediction studies.
Alobidan et al. (Fri,) studied this question.