What question did this study set out to answer?

The aim is to improve the prediction of injury severity in older drivers involved in accidents, addressing challenges like class imbalance and label noise.

April 22, 2026Open Access

Injury Severity Prediction for Older Driver Accidents via Denoised Cascade Framework and Probability Calibration

Key Points

The aim is to improve the prediction of injury severity in older drivers involved in accidents, addressing challenges like class imbalance and label noise.
Developed a Log-Loss Cleaned and Probability-Calibrated Cascade framework for injury severity prediction.
Implemented noise filtering to clean training data and enhance representation learning.
Utilized a two-stage cascade model with a Preliminary Screening Model and a Stacking ensemble classifier for fine-grained classification.
Achieved a Macro-F1 score of 0.7296, showing effective performance in injury severity prediction.
Improved recall and F1-score for severe and fatal cases by over 82% and 62%, respectively, compared to baseline.
Validated the contributions of data cleaning and calibration through ablation analyses.

Abstract

Accurately estimating the severity of crash injuries among older drivers is paramount for enhancing traffic safety, a task challenged by class imbalance and label noise. Traditional predictive paradigms often struggle to identify rare severe cases, as they tend to prioritize global accuracy, thereby compromising sensitivity to high-risk outcomes. To overcome these limitations, this study develops a Log-Loss Cleaned and Probability-Calibrated Cascade (L-CSC) framework by strategically integrating existing advanced algorithmic components for robust and reliable severity prediction. Initially, a Log-Loss-based noise filtering mechanism is implemented to purge outliers and ambiguous samples from the training data, thereby enabling higher-quality representation learning. Subsequently, a two-stage cascade architecture is designed to decouple the classification task. Stage I employs a Preliminary Screening Model, optimized via Bayesian optimization for F2-score, to specifically maximize the recall for severe and fatal cases. In Stage II, a Stacking ensemble classifier is deployed to achieve a fine-grained classification of injury levels among the cases identified in the initial screening. Finally, Isotonic Regression is employed to calibrate the output probabilities from both stages, ensuring that the resulting risk estimations are statistically sound and reliable. Empirical evaluations demonstrate that the L-CSC framework effectively balances overall performance with critical risk detection, achieving a robust Macro-F1 of 0.7296. Specifically, compared to the best-performing baseline, the recall and F1-score for the critical severe and fatal category showed relative improvements of over 82% and 62%, respectively. Ablation analyses further substantiate the vital contributions of both the data cleaning and calibration modules. This research demonstrates that the cascaded framework effectively mitigates the biases inherent in imbalanced datasets, providing a robust algorithmic foundation to potentially support future traffic safety interventions.

Injury Severity Prediction for Older Driver Accidents via Denoised Cascade Framework and Probability Calibration

Key Points

Abstract

Cite This Study