What type of study is this?

This is a Quantitative Study study.

October 18, 2025Open Access

Loan Default Prediction Using Machine Learning Algorithms

ZKZhongming KangUniversiti Sains Malaysia STSin Yin TehUniversiti Sains Malaysia STShubin TanGuangdong 999 Brain Hospital

Key Points

LightGBM achieved the highest accuracy of 0.9764 for predicting loan default, indicating its effectiveness.
Feature importance analysis revealed that interest and credit type are among the key predictors of loan default.
Model performance was evaluated through accuracy, precision, and recall, emphasizing comprehensive assessment metrics.
Synthetic Minority Over-sampling Technique (SMOTE) was employed to address class imbalance during the model training.

Abstract

Financial institutions constantly face at the risk of default by borrowers which can result in significant financial losses. It is essential to develop an appropriate predictive model for loan default to reduce these risks and minimise financial losses. The objective of this study is to identify the most suitable machine learning model to predict loan default by comparing four models which are Random Forest, Decision Tree, Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Additionally, it also examines the key features influencing loan default prediction. The dataset used in this study is sourced from Kaggle and it consists of 148,670 rows with 34 features. As class imbalance is common in the model prediction, Synthetic Minority Over-sampling Technique (SMOTE) is applied during model training to enhance predictive performance. Model performance is evaluated using five significant assessment metrics: accuracy, precision, F1-score, recall, and the area under the receiver operating characteristic curve (ROC AUC). The outcomes indicate that LightGBM performs the best among the other models with the highest accuracy (0.9764), in addition to precision (0.9747) and recall (0.9503) scores. Feature importance analysis is conducted by using permutation importance. It identifies interest, credit type, interest rate spread, and upfront charges as the four most significant features of loan default. These findings provide useful information for financial institutions aiding risk assessment and decision-making to mitigate potential losses.

Ask AI

Helpful

Bookmark

View Full Paper