This study develops a hybrid framework integrating ensemble learning with explainable artificial intelligence to address the methodological challenge of balancing predictive accuracy and interpretability in credit risk model comparison. Using the German Credit Dataset, we implemented a comprehensive preprocessing pipeline, including feature encoding, scaling, and SMOTE for class imbalance handling. Four base models, logistic regression, Random Forest, XGBoost, and Multilayer Perceptron, were combined through a Stacked Ensemble with a logistic regression meta learner. The ensemble demonstrated strong performance, achieving an AUC of 0.761, precision of 0.783, recall of 0.806, and an F1 score of 0.794, which represented the highest scores among all models tested. Notably, Random Forest (AUC = 0.749) surpassed XGBoost (AUC = 0.733), challenging conventional algorithmic hierarchies. SHAP analysis provided transparent global and local interpretability, identifying Current Account status (SHAP = 0.153), Loan Duration (0.064), and Savings Account (0.063) as dominant predictor variables. Class-imbalance handling and threshold optimisation enhanced practical utility by reducing false positives from 39 to 16, thereby aligning with financial risk priorities. The framework provides a reproducible methodological pipeline for systematically comparing credit scoring approaches, demonstrating how predictive performance can be evaluated alongside interpretability considerations within a benchmark dataset context.
Mathibela et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: