What question did this study set out to answer?

To develop a predictive model for estimating potential collision severity before accidents occur.

February 11, 2026Open Access

Potential Collision Severity Prediction Based on Data Distribution‐Preserving Resampling

Key Points

To develop a predictive model for estimating potential collision severity before accidents occur.
Characterizing potential collision severity using systematic feature selection
Implementing a distribution-preserving resampling method to address class imbalance
Applying Remove Redundant Under Sampling (RRUS) and Core Seed-based SMOTE techniques
Utilizing the XGBoost algorithm on the NHTSA dataset
Achieved a prediction accuracy of over 97.7%
Outperformed other classification algorithm models
Successfully transformed the imbalanced dataset into a balanced one while maintaining original distribution characteristics

Abstract

ABSTRACT The majority of existing research on collision severity focuses on post‐collision severity, which is not conducive to collision prevention. This paper proposes a novel method for predicting the severity of potential collisions, aiming to establish a prediction model to predict the potential consequences of collisions before they occur, providing a basis for quantifying driving risk. In developing this model, two key challenges are addressed: how to effectively characterise the severity of potential collisions and how to manage the class imbalance caused by the scarcity of severe collisions. To tackle the first challenge, we introduce a systematic approach to find the most representative features of potential collision severity. For the second challenge, we propose a distribution‐preserving resampling method to address the class imbalance. This approach includes two techniques: Remove Redundant Under Sampling (RRUS) and Core Seed‐based Synthetic Minority Oversampling Technique (CS‐SMOTE), which transform the imbalanced dataset into a balanced one while preserving the distribution characteristics of the original dataset. Finally, using the National Highway Traffic Safety Administration (NHTSA) dataset and the XGBoost algorithm, a potential collision severity prediction model is developed. The results demonstrate that the model achieves a prediction accuracy of over 97.7%, outperforming comparison models developed using other classification algorithms.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhao et al. (Thu,) studied this question.

synapsesocial.com/papers/698c1cd3267fb587c655f89e https://doi.org/https://doi.org/10.1049/itr2.70163

Bookmark

View Full Paper