What question did this study set out to answer?

This research aims to develop and assess machine learning models that predict the success of vehicle repossession.

May 15, 2026Open Access

Predictive modeling for auto repossession success: a comparative evaluation

Key Points

This research aims to develop and assess machine learning models that predict the success of vehicle repossession.
Evaluated machine learning models using the Automobile Repossession Dataset (ARD).
Compared model performance under standard cross-validation and alternative data-splitting strategies by client and year.
Analyzed multiple classification models for their effectiveness in estimating vehicle recovery likelihood.
CatBoost achieved the highest Area Under the Receiver Operating Characteristic Curve (AUC-ROC) score of 0.692 in standard validation.
Logistic regression performed competitively on future data, demonstrating its efficiency.
Alternative evaluation strategies influenced model performance differently, highlighting the need for robust validation in imbalanced datasets.

Abstract

Predicting whether a vehicle repossession will succeed can save lenders time and money, yet data and analytical models for this task are limited. This study develops and evaluates machine learning models to estimate the likelihood that a repossession assignment will result in vehicle recovery. Using the Automobile Repossession Dataset (ARD) — a proprietary, anonymized multi-year collection of assignment records from a regional repossession company — we assess model performance under alternative evaluation and data-splitting strategies to examine fairness and generalization. In addition to standard cross-validation on the full ARD, we evaluate the performance of three classification models when data is split by repossession client (to evaluate generalization ability and fairness concerns), or split by year (to evaluate robustness against temporal distribution shift). Results and statistical analyses indicate that each model performs similarly under standard cross-validation, but that alternative evaluation strategies can affect each model to varying degrees. For example, CatBoost achieves the highest Area Under the Receiver Operating Characteristic Curve (AUC-ROC) performance (0.692) under standard cross-validation, whereas logistic regression — a simpler and faster model — performs competitively when evaluated on future data. These findings highlight that robust validation is essential for operational machine learning in imbalanced datasets and provide the first benchmark for repossession prediction. The study offers new insight for lenders and recovery agencies seeking data-driven efficiency improvements.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predictive modeling for auto repossession success: a comparative evaluation

Key Points

Abstract

Cite This Study