What question did this study set out to answer?

This study aims to assess how machine learning can classify the levels of responsibility in pedestrian crashes.

March 7, 2026Open Access

Evaluation of the level of responsibility in pedestrian crashes using machine learning algorithms

Key Points

This study aims to assess how machine learning can classify the levels of responsibility in pedestrian crashes.
Evaluated different supervised classification models using real pedestrian crash data.
Analyzed 14 binary variables across human, technological, structural, and normative subsystems.
Compared performance metrics of Decision Trees, Naïve Bayes, and Support Vector Machine models.
Decision Trees model performed best among the evaluated models for classifying responsibility.
The most influential factor was possessing a driver's license (47.26%).
Other significant factors included pedestrian location (15.35%), driver under the influence (7.24%), and distracted driving (7.04%).

Abstract

Traffic crashes involving pedestrians tend to result in the most casualties (minor, serious, or fatal). Therefore, accurately determining the level of responsibility in a pedestrian crash is crucial, as liability can lead to civil, administrative, or criminal consequences. Despite its importance, the scientific literature contains very few studies focused specially on the attribution of responsibility in traffic accidents, and even fewer focus on pedestrian collisions. This study evaluated different supervised classification models using Machine Learning (ML) techniques to classify the levels of responsibility of both drivers and pedestrians using real crash data. In this evaluation, 14 binary variables were considered based on four subsystems: human, technological, structural, and normative. The goal is to help judicial and police authorities make more efficient and objective attributions of responsibility. This involves analyzing the most influential variables after the classification process. Then, policymakers will be able to use these assessments to develop new strategies for improving road safety. The dataset consists of 510 pedestrian crashes extracted from the reports by the Local Police of Badajoz (LPB) in Spain and judicial decisions of the Spanish Judiciary (SJ). Of the models analyzed, Decision Trees (DT), Naïve Bayes (NB), and Support Vector Machine (SVM) models produced the best initial performance. These three models were then compared, and the metrics showed that the DT model is the best option. Furthermore, the feature importance analysis of the 14 variables revealed that possessing a driver's license is the most influential factor in determining responsibility (47.26%). The next most influential factors were the pedestrian's location (15.35%); driver under the influence of alcohol/drugs (7.24%); and distracted driving, e.g., using a mobile phone (7.04%).

Bookmark

View Full Paper

Bookmark

View Full Paper

Evaluation of the level of responsibility in pedestrian crashes using machine learning algorithms

Key Points

Abstract

Cite This Study