September 28, 2025Open Access

Advanced loan default prediction models using Machine Learning boosting algorithms

Key Points

The research achieved impressive ~98% prediction accuracy for loan defaults, using advanced ML techniques.
Models like XGBoost and Light GBM significantly outperformed traditional logistic regression in accuracy.
Data-driven modeling combined with robust cross-validation methods ensured reliable evaluation of credit risk.
This work emphasizes the potential of innovative ML models to enhance tools for financial stability and lending.

Abstract

This research presents an advanced scientific approach using machine learning (ML) models with boosting algorithms and a data-driven modeling approach to achieve ~ 98% prediction accuracy for credit risk evaluation. The study was conducted using a public, loan-level dataset from Freddie Mac for the post-2020 period, and identified multiple credit risk factors that influenced the likelihood of loan default. The research examined whether ML boosting algorithms, including Gradient Boosting, XGBoost, and Light GBM, outperformed Logistic Regression in predictive performance. The paper proposes novel ML-based credit risk algorithms to address challenges, including data imbalance, hyperparameter optimization, and robust cross-validation, to achieve reliable estimation. For comprehensiveness and robustness, model performance was evaluated using a suite of key metrics, including accuracy, sensitivity, specificity, true and false positive rates, AUC, F1 scores, and ROC analysis. The empirical results of the paper demonstrated that ensemble methods consistently achieved superior accuracy compared to single-model approaches. The paper found that XGBoost and Light GBM were the top performers with 98% accuracy after optimization and 5-fold cross-validation. The findings demonstrated that ML models using boosting algorithms, especially XGBoost and Light GBM, achieved remarkable accuracy in distinguishing between “good” and “bad” loans compared to the traditional logit model, without exhibiting signs of overfitting. By outperforming current models for predicting loan defaults, the result carries significant implications for lenders, regulators, and policymakers in the financial industry, providing more robust tools for credit risk modeling, facilitating the development of FinTech-driven lending solutions, and supporting the preservation of financial stability.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper