What type of study is this?

September 10, 2025Open Access

Vehicle Loan Default Prediction Based on Machine Learning

XWXiaohui WuTianjin University of Commerce

Key Points

XGBoost achieved an AUC of 0.89, significantly outperforming logistic regression with AUC of 0.62.
Key risk factors identified include credit score, loan-to-value ratio over 85%, and self-employed status.
Feature engineering methods like SMOTE and outlier correction improved data quality for loan prediction models.
Findings suggest actionable insights for financial institutions to refine lending policies and mitigate losses.

Abstract

Loan default prediction is a core issue in financial risk management, directly impacting credit decisions and capital allocation efficiency. This study is based on 200, 000 anonymized loan records, employing feature engineering (e. g. , standardization of credit history length, outlier correction) and SMOTE oversampling to optimize data quality. The performance of algorithms such as XGBoost and LightGBM was compared, with SHAP and LIME methods used to enhance model interpretability. Results show that XGBoost achieved the best performance (test set AUC=0. 89), significantly outperforming logistic regression (AUC=0. 62). Key risk factors include credit score (SHAP mean value=0. 32), high loan-to-value ratio (LTV>85%, OR=2. 1 for default risk), and self-employed status (default probability 1. 8 times higher than salaried individuals). The model combines high accuracy with business logic consistency, recommending LTV>85% and PERFORMCNSSCORE<500 as high-risk customer screening criteria to provide dynamic risk management tools for financial institutions. These findings not only validate the effectiveness of advanced machine learning in credit risk assessment but also offer actionable insights for refining lending policies and reducing potential losses. Future research could incorporate macroeconomic variables to enhance dynamic prediction capabilities.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper