What question did this study set out to answer?

This research aims to develop an effective predictive model to classify customer churn in the telecommunications industry.

May 29, 2026Open Access

Machine Learning–Based Customer Churn Prediction in Telecommunication Industry

Key Points

This research aims to develop an effective predictive model to classify customer churn in the telecommunications industry.
Utilized the IBM Telco Customer Churn dataset containing 5634 preprocessed records.
Applied four machine learning models: decision tree, random forest, XGBoost, and logistic regression.
Evaluated models using accuracy, precision, recall, F1-score, and ROC-AUC metrics.
Random forest achieved an accuracy of 84.73% and precision of 85.20% for churn.
Recall for churn was 84.07% with an F1-score of 84.62%.
ROC-AUC for the random forest model was 93.86%.

Abstract

Customer churn remains a major challenge in the telecommunications industry, where retaining existing customers is significantly more cost‐effective than acquiring new ones. This study uses the publicly available IBM Telco Customer Churn dataset consisting of 5634 preprocessed records to develop a predictive model for customer churn classification. The dataset was cleaned and prepared through preprocessing steps including handling missing values and encoding categorical variables to ensure suitability for machine learning analysis. Four classical machine learning models—decision tree, random forest, XGBoost, and logistic regression—were selected due to their proven effectiveness in classification tasks, interpretability, and strong performance in structured tabular datasets commonly used in churn prediction studies. These models were evaluated using standard performance metrics, including accuracy, precision, recall, F 1‐score, and ROC‐AUC. To enhance predictive performance and stability, a tuned approach was applied. The random forest achieved an accuracy of 84.73%, precision of 85.20% for churn, recall of 84.07% for churn, F 1‐score of 84.62%, and ROC‐AUC of 93.86%. The results demonstrate that combining well‐tuned classical machine learning models can produce reliable and robust churn prediction performance. This study contributes by providing a systematic evaluation of classical models on a benchmark dataset with standardized preprocessing and comprehensive performance analysis for telecom churn prediction.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper