August 6, 2025Open Access

Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data

Key Points

The study develops a SHAP-enhanced framework for predicting financialization using high-dimensional data with complex interactions.
XGBoost achieved an RMSE of 0.082 and 99.34% explained variance, indicating effective model performance.
Key predictors impacting financialization include CSR-related factors, SOE status, and ownership concentration.
Significant changes in factor importance were noted during the COVID-19 pandemic, revealing critical shifts in predictive relationships.

Abstract

High-dimensional prediction problems with complex non-linear feature interactions present significant algorithmic challenges in machine learning, particularly when dealing with imbalanced datasets and multicollinearity issues. This study proposes an innovative Shapley Additive Explanations (SHAP) -enhanced machine learning framework that integrates SHAP with advanced ensemble methods for interpretable financialization prediction. The methodology simultaneously addresses high-dimensional feature selection using 40 independent variables (19 CSR-related and 21 financialization-related), multicollinearity issues, and model interpretability requirements. Using a comprehensive dataset of 25, 642 observations from 3776 Chinese A-share companies (2011–2022), we implement nine optimized machine learning algorithms with hyperparameter tuning via the Hippopotamus Optimization algorithm and five-fold cross-validation. XGBoost demonstrates superior performance with 99. 34% explained variance, achieving an RMSE of 0. 082 and R2 of 0. 299. SHAP analysis reveals non-linear U-shaped relationships between key predictors and financialization outcomes, with critical thresholds at approximately 10 for CSRSocR, 1. 5 for CSRS, and 5 for CSRCV. SOE status, EPU, ownership concentration, firm size, and housing prices emerge as the most influential predictors. Notable shifts in factor importance occur during the COVID-19 pandemic period (2020–2022). This work contributes a scalable, interpretable machine learning architecture for high-dimensional financial prediction problems, with applications in risk assessment, portfolio optimization, and regulatory monitoring systems.

Interpretable Machine Learning Framework for Corporate Financialization Prediction: A SHAP-Based Analysis of High-Dimensional Data

Key Points

Abstract

Cite This Study