What does this research mean for the field?

SMOTE-ENN combined with multilayer perceptron achieves the highest F1-score of 0.928 for credit risk prediction on the German dataset. Novelty: ClaimNovelty.CONFIRMATORY. Consensus alignment: ConsensusAlignment.SUPPORTS_CONSENSUS.

What question did this study set out to answer?

Evaluate and compare the performance of machine learning and deep learning models for credit risk prediction.

March 13, 2026Open Access

Performance Evaluation of Machine Learning and Deep Learning Models for Credit Risk Prediction

Key Points

Evaluate and compare the performance of machine learning and deep learning models for credit risk prediction.
Compared deep learning architectures and traditional models on imbalanced datasets
Utilized resampling techniques including SMOTE, ENN, and hybrid SMOTE-ENN
Analyzed models like MLP, CNN, LSTM, GRU, logistic regression, decision tree, SVM, random forest, adaptive boosting, and extreme gradient boosting.
MLP with SMOTE-ENN achieved the highest F1-score of 0.928 (accuracy 95.4%) on the German dataset
Random forest with SMOTE-ENN attained an F1-score of 0.789 (accuracy 82.1%) on the Taiwanese dataset
SHAP was used to interpret model predictions and identify important features affecting credit default.

Abstract

Credit risk prediction is essential for financial institutions to effectively assess the likelihood of borrower defaults and manage associated risks. This study presents a comparative analysis of deep learning architectures and traditional machine learning models on imbalanced credit risk datasets. To address class imbalance, we employ three resampling techniques: Synthetic Minority Over-sampling Technique (SMOTE), Edited Nearest Neighbors (ENN), and the hybrid SMOTE-ENN. We evaluate the performance of various models, including multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), gated recurrent unit (GRU), logistic regression, decision tree, support vector machine (SVM), random forest, adaptive boosting, and extreme gradient boosting. The analysis reveals that SMOTE-ENN combined with MLP achieves the highest F1-score of 0.928 (accuracy 95.4%) on the German dataset, while SMOTE-ENN with random forest attains the best F1-score of 0.789 (accuracy 82.1%) on the Taiwanese dataset. SHapley Additive exPlanations (SHAP) are employed to enhance model interpretability, identifying key drivers of credit default. These findings provide actionable guidance for developing transparent, high-performing, and robust credit risk assessment systems.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mapfumo et al. (Wed,) studied this question.

synapsesocial.com/papers/69b3ace502a1e69014ccef7e https://doi.org/https://doi.org/10.3390/jrfm19030210

Bookmark

View Full Paper