What question did this study set out to answer?

The aim is to develop a framework that balances predictive accuracy and interpretability in credit default risk models.

March 12, 2026Open Access

Predictive Modelling of Credit Default Risk Using Machine Learning and Ensemble Techniques

Key Points

The aim is to develop a framework that balances predictive accuracy and interpretability in credit default risk models.
Utilized the German Credit Dataset for model training and validation.
Implemented preprocessing steps including feature encoding, scaling, and SMOTE for class imbalance.
Combined four models—logistic regression, Random Forest, XGBoost, and Multilayer Perceptron—using a Stacked Ensemble.
Applied SHAP analysis for interpretability of prediction results.
Achieved an AUC of 0.761, the highest among all tested models.
Precision was 0.783, and recall reached 0.806, contributing to an F1 score of 0.794.
Random Forest outperformed XGBoost in terms of AUC despite conventional expectations.
Reduced false positives from 39 to 16, enhancing practical applicability of the model.

Abstract

This study develops a hybrid framework integrating ensemble learning with explainable artificial intelligence to address the methodological challenge of balancing predictive accuracy and interpretability in credit risk model comparison. Using the German Credit Dataset, we implemented a comprehensive preprocessing pipeline, including feature encoding, scaling, and SMOTE for class imbalance handling. Four base models, logistic regression, Random Forest, XGBoost, and Multilayer Perceptron, were combined through a Stacked Ensemble with a logistic regression meta learner. The ensemble demonstrated strong performance, achieving an AUC of 0.761, precision of 0.783, recall of 0.806, and an F1 score of 0.794, which represented the highest scores among all models tested. Notably, Random Forest (AUC = 0.749) surpassed XGBoost (AUC = 0.733), challenging conventional algorithmic hierarchies. SHAP analysis provided transparent global and local interpretability, identifying Current Account status (SHAP = 0.153), Loan Duration (0.064), and Savings Account (0.063) as dominant predictor variables. Class-imbalance handling and threshold optimisation enhanced practical utility by reducing false positives from 39 to 16, thereby aligning with financial risk priorities. The framework provides a reproducible methodological pipeline for systematically comparing credit scoring approaches, demonstrating how predictive performance can be evaluated alongside interpretability considerations within a benchmark dataset context.

Predictive Modelling of Credit Default Risk Using Machine Learning and Ensemble Techniques

Key Points

Abstract

Cite This Study

Also Consider

Also Consider