What question did this study set out to answer?

The aim is to evaluate the effectiveness of machine learning classifiers in predicting corporate bankruptcy using extensive financial data.

February 2, 2026Open Access

View Full Paper

Corporate financial distress prediction: a machine learning approach in the era of big data

GGGianluca GabrielliUniversity of Parma AMAndrea MelioliUniversity of Parma FBFlavio BertiniUniversity of Parma

Key Points

The aim is to evaluate the effectiveness of machine learning classifiers in predicting corporate bankruptcy using extensive financial data.
Analyzed 1,826,157 firm-year observations from 1980 to 2019.
Compared various machine learning models, including random forest and gradient boosting.
Addressed class imbalance through resampling techniques like SMOTE.
Validated models on held-out samples, regional subsets, and out-of-time tests.
Ensemble methods outperformed other classifiers with AUCs near 0.99.
F1-scores reached up to 0.98 with raw accounting inputs.
Resampling techniques enhanced model robustness.
Capital-structure metrics emerged as critical early warning indicators.

Abstract

Purpose This study aims to evaluate the efficacy of modern machine learning classifiers, random forest, gradient boosting trees, decision trees, support vector machines and logistic regression, in forecasting corporate bankruptcy among Italian firms, aiming to surpass traditional credit-scoring approaches by leveraging rich financial data. Design/methodology/approach Using a comprehensive panel of 1,826,157 firm–year observations (532,255 active; 76,464 bankrupt) from 1980 to 2019, the authors compare models trained on different data configurations, while addressing class imbalance through undersampling and advanced synthetic minority over-sampling technique (SMOTE) techniques. Models are validated on held-out samples, regional subsets and an out-of-time test (2016–2017), with performance gauged by area under the curve (AUC), F1-score, precision, recall and specificity. Findings Ensemble methods (random forest and gradient boosting) outperform other classifiers, particularly when using raw accounting inputs, achieving AUCs near 0.99 and F1-scores up to 0.98; resampling enhances robustness without diminishing predictive power, and variable-importance analysis underscores capital-structure metrics as key early warning indicators. Originality/value To the best of the authors’ knowledge, this is the first large-scale Italian bankruptcy study to juxtapose ratio-based models with high-dimensional raw data under multiple SMOTE variants, revealing that comprehensive financial statement variables markedly improve predictive accuracy and offering novel insights for both researchers and risk practitioners.

Ask AI

Helpful

Bookmark

View Full Paper

Ask AI

Helpful

Bookmark

View Full Paper

Corporate financial distress prediction: a machine learning approach in the era of big data

Key Points

Abstract

Cite This Study

Also Consider

Also Consider