What question did this study set out to answer?

To identify the most effective machine learning model for predicting customer churn in the banking industry.

March 4, 2026Open Access

Machine Learning Approaches in Banking Industry for Customer Churn Analysis

Puntos clave

To identify the most effective machine learning model for predicting customer churn in the banking industry.
Compared supervised learning techniques: logistic regression, random forests, decision trees.
Used J48 decision tree, Random Forest, and Bagging for model development.
Applied k-fold cross-validation for parameter adjustment in model building.
Utilized synthetic minority oversampling to address class imbalance.
Conducted experiments on a dataset with 9978 instances and 11 features.
J48 classifier achieved an accuracy of 90%, the highest among the models.
J48 demonstrated the best recall and f-measure for customer churn prediction.
Bagging and Random Forest were found effective but less so than J48.

Resumen

This study explores the application of machine learning algorithms for customer churn prediction in the banking industry. By comparing supervised learning techniques such as logistic regression, random forests, and decision trees, the study aims to identify the most effective model for enhancing customer retention strategies. The research contributes to the growing field of AI in finance and supports data-driven decision-making in customer relationship management. In this study, decision tree-based classifier J48, Random Forest, and Bagging, were chosen to develop the learning model, with a dataset split into two training and testing sets, as well as with varying k-fold cross validation of parameter adjustment. The model building experiment was conducted on a dataset containing 9978 instances and 11 features collected from the Cooperative Bank of Oromia. To compensate for the influence of class imbalance on performance prediction, synthetic minority oversampling techniques were applied. The proposed method experimentation process is followed by preprocessing, feature selection, modeling, and evaluation. To identify which algorithm works best for customer churn analysis, we have conducted several learning models building experiments. Hence, when the model created using J48 with a 66% percentage split dataset, better results were obtained. The accuracy of the model was 90%, giving it the highest recall and f-measure. As a result, the J48 classifier algorithm is found to be the best to predict customer churn in the banking sector, followed by the Bagging and random forest classifier algorithms, respectively

Me gusta

Guardar

Ver artículo completo