Purpose This study aims to evaluate the efficacy of modern machine learning classifiers, random forest, gradient boosting trees, decision trees, support vector machines and logistic regression, in forecasting corporate bankruptcy among Italian firms, aiming to surpass traditional credit-scoring approaches by leveraging rich financial data. Design/methodology/approach Using a comprehensive panel of 1,826,157 firm–year observations (532,255 active; 76,464 bankrupt) from 1980 to 2019, the authors compare models trained on different data configurations, while addressing class imbalance through undersampling and advanced synthetic minority over-sampling technique (SMOTE) techniques. Models are validated on held-out samples, regional subsets and an out-of-time test (2016–2017), with performance gauged by area under the curve (AUC), F1-score, precision, recall and specificity. Findings Ensemble methods (random forest and gradient boosting) outperform other classifiers, particularly when using raw accounting inputs, achieving AUCs near 0.99 and F1-scores up to 0.98; resampling enhances robustness without diminishing predictive power, and variable-importance analysis underscores capital-structure metrics as key early warning indicators. Originality/value To the best of the authors’ knowledge, this is the first large-scale Italian bankruptcy study to juxtapose ratio-based models with high-dimensional raw data under multiple SMOTE variants, revealing that comprehensive financial statement variables markedly improve predictive accuracy and offering novel insights for both researchers and risk practitioners.
Gabrielli et al. (Mon,) studied this question.