What question did this study set out to answer?

The aim is to identify key factors influencing accident severity and evaluate feature-selection methods for predictive modeling.

March 25, 2026Open Access

Feature Selection for Accident Severity Modeling: A WCFR-Based Analysis on the U.S. Accidents Dataset

Key Points

The aim is to identify key factors influencing accident severity and evaluate feature-selection methods for predictive modeling.
Analysis of the U.S. Accidents dataset from 2016 to 2023.
Evaluation of several feature-selection techniques including L1-regularized logistic regression and WCFR.
Comparison of selected features using logistic regression, random forest, and XGBoost models.
WCFR shows superior performance with validation accuracy reaching approximately 0.84.
Macro-F1 scores improve to about 0.55 with fewer features selected.
WCFR enhances model interpretability while effectively predicting accident severity.

Abstract

Traffic accidents are among the leading causes of injury worldwide, highlighting the urgent need to better understand the factors that contribute to accident occurrence and severity in order to improve road safety and reduce injuries and fatalities. This study analyzes the U.S. Accidents dataset, comprising data collected from 2016 to 2023, to identify the key determinants of accident severity and to evaluate feature-selection techniques for predictive modeling. To this end, several feature-selection methods are examined, including L1-regularized logistic regression, minimum redundancy maximum relevance (mRMR), conditional mutual information maximization (CMIM), ReliefF, and tree-based importance measures; these are compared with the Weighted Conditional Mutual Information (WCFR). The selected feature subsets are then evaluated using three machine learning models: logistic regression, random forest, and XGBoost. Experimental results show that WCFR consistently outperforms the competing methods, achieving higher validation accuracy (up to approximately 0.84) and Macro-F1 scores (up to approximately 0.55), while using fewer features and maintaining model interpretability. These results indicate that WCFR is particularly effective for accident severity modeling and highlight its potential as a robust feature selection strategy for large-scale transportation safety analytics and future severity prediction studies.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Alobidan et al. (Fri,) studied this question.

synapsesocial.com/papers/69c37b33b34aaaeb1a67d587 https://doi.org/https://doi.org/10.3390/electronics15061308

Bookmark

View Full Paper