Loan default prediction is a core issue in financial risk management, directly impacting credit decisions and capital allocation efficiency. This study is based on 200, 000 anonymized loan records, employing feature engineering (e. g. , standardization of credit history length, outlier correction) and SMOTE oversampling to optimize data quality. The performance of algorithms such as XGBoost and LightGBM was compared, with SHAP and LIME methods used to enhance model interpretability. Results show that XGBoost achieved the best performance (test set AUC=0. 89), significantly outperforming logistic regression (AUC=0. 62). Key risk factors include credit score (SHAP mean value=0. 32), high loan-to-value ratio (LTV>85%, OR=2. 1 for default risk), and self-employed status (default probability 1. 8 times higher than salaried individuals). The model combines high accuracy with business logic consistency, recommending LTV>85% and PERFORMCNSSCORE<500 as high-risk customer screening criteria to provide dynamic risk management tools for financial institutions. These findings not only validate the effectiveness of advanced machine learning in credit risk assessment but also offer actionable insights for refining lending policies and reducing potential losses. Future research could incorporate macroeconomic variables to enhance dynamic prediction capabilities.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaohui Wu
Tianjin University of Commerce
Theoretical and Natural Science
China University of Geosciences
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaohui Wu (Wed,) studied this question.
synapsesocial.com/papers/68c1d5e554b1d3bfb60f8886 — DOI: https://doi.org/10.54254/2753-8818/2025.ad26291
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: