What type of study is this?

This is a Quantitative Study study.

October 27, 2025Open Access

Detection of Bank Customer Churn Using Neural Network and Voting Classifier Ensembles

Key Points

The ensemble approaches slightly outperformed individual models, with a soft voting classifier achieving an accuracy of 89.46%.
Integration of synthetic data using CTGAN and improved preprocessing methods enhanced the model's generalization capabilities.
Prominent machine learning techniques like Random Forest and XGBoost were included alongside neural networks for better predictive accuracy.
This work contributes to banking strategies by leveraging machine learning techniques to accurately detect and reduce customer churn.

Abstract

Customer churn is the loss of business clients to a competitor. Since keeping current clients is more economical than finding new ones, customer retention measures such as churn detection are now essential aspects of modern banking strategy. However, many existing studies rely heavily on conventional machine learning approaches such as Support Vector Machines, Logistic Regression , Random Forest, etc., often neglecting the deeper learning capabilities of neural networks. Also, the repeated use of the same small dataset by the banking studies may limit the improvement of the models’ generalisation. To address these gaps, this study presents a method that integrates deep learning for customer churn detection and a soft and hard voting classifier ensemble embedded with the best performing models over the years for results comparison, supported by a synthetic data augmentation method for model improvement. The study utilised a secondary banking churn dataset from Kaggle, which contained 10,000 unique customer records. To address the dataset limitations, a Conditional Tabular Generative Adversarial Network (CTGAN) model was used to generate an additional 10,000 records, expanding the dataset used for the study to 20,000 rows. Data preprocessing steps were done before training, including oversampling using Synthetic Minority Oversampling Technique (SMOTE). Model development and analysis processes were implemented using Python programming language with prominent libraries and frameworks on Google Colab. In this study, a Feedforward neural network and a soft and hard voting classifier were developed. The voting classifier ensembles integrated three prominent classifiers: Random Forest, XGBoost, and Logistic Regression. The performances were evaluated using Accuracy, F1 Score, and Area Under ROC Curve as metrics. Results show that while the Feedforward Neural Network achieved strong predictive performance with an accuracy of 88.23%, an F1 Score of 87.83% and an AUC of 94.73%, the ensemble approaches performed slightly better as the soft voting classifier delivered the best results, obtaining an accuracy of 89.46%, F1 Score of 88.92% and AUC of 95.40% showing the advantage of combining multiple models to leverage complementary strengths. After comparison with past studies, the proposed models did not surpass the very best outcomes. However, they remain highly competitive, achieving performance levels that are on par with or exceed many earlier works. The contribution of this work is to show how synthetic data augmentation, enhanced preprocessing, deep learning techniques, and machine learning ensembles can improve churn detection in banking studies. Banking institutions can utilise the results from this study to accurately detect churn, supporting proactive customer retention strategies, targeted marketing, and personalised financial services, thereby reducing revenue losses.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper