ABSTRACT The majority of existing research on collision severity focuses on post‐collision severity, which is not conducive to collision prevention. This paper proposes a novel method for predicting the severity of potential collisions, aiming to establish a prediction model to predict the potential consequences of collisions before they occur, providing a basis for quantifying driving risk. In developing this model, two key challenges are addressed: how to effectively characterise the severity of potential collisions and how to manage the class imbalance caused by the scarcity of severe collisions. To tackle the first challenge, we introduce a systematic approach to find the most representative features of potential collision severity. For the second challenge, we propose a distribution‐preserving resampling method to address the class imbalance. This approach includes two techniques: Remove Redundant Under Sampling (RRUS) and Core Seed‐based Synthetic Minority Oversampling Technique (CS‐SMOTE), which transform the imbalanced dataset into a balanced one while preserving the distribution characteristics of the original dataset. Finally, using the National Highway Traffic Safety Administration (NHTSA) dataset and the XGBoost algorithm, a potential collision severity prediction model is developed. The results demonstrate that the model achieves a prediction accuracy of over 97.7%, outperforming comparison models developed using other classification algorithms.
Zhao et al. (Thu,) studied this question.