What question did this study set out to answer?

February 26, 2026Open Access

A Unified Multi-Task Deep Learning Framework for Early Churn Detection with Risk-Aware and Explainable Recommendations

Key Points

The aim is to develop a multi-task deep learning framework for early customer churn detection and related tasks.
Used a multi-task deep learning model for churn prediction, credit-score classification, and high-balance identification.
Applied a preprocessing pipeline for missing and extreme values management.
Optimized thresholds based on validation and conducted statistical significance testing to confirm results.
Incorporated a rule-based explanation module for interpretability and operational usability.
Achieved an F1-score of 0.86 and recall of 0.85 for churn prediction.
Outperformed classical models including Logistic Regression, Random Forest, and MLP.
Provided enhanced interpretability through annotated visualizations and example-based explanations.

Abstract

This study presents a unified and reproducible multi-task deep learning framework for early customer churn detection that jointly learns three highly related tasks: churn prediction, credit-score classification, and high-balance identification. Leveraging correlations among these auxiliary tasks enables positive transfer and strengthens the shared representation, thereby improving performance on the main churn task. A robust preprocessing pipeline was applied to handle missing and extreme values, while validation-driven threshold optimization and statistical significance testing were used to ensure reliability of the reported improvements. To enhance interpretability and operational usability, a rule-based explanation module converts model outputs into business-readable reasons, and a risk-alert mechanism maps predicted probabilities into calibrated risk levels, complemented by annotated visualizations and example-based explanations. Experiments on a real European banking dataset show that the proposed model outperforms classical baselines such as Logistic Regression, Random Forest, and MLP, achieving an F1-score of 0.86 and a recall of 0.85 for churn prediction. Additional analyses-including expanded dataset statistics, clearer and better-annotated figures, and visualization of risk distributions-were added to enhance clarity. Statistical significance testing (confidence intervals and McNemar tests) was incorporated to validate the superiority of the proposed multi-task approach. Furthermore, the explanation module was extended with aggregated reason frequencies and improved visual summaries to strengthen interpretability. The framework is computationally efficient and fully scripted for reproducibility. While it demonstrates strong practical value, the current evaluation is limited to a single dataset and comparisons with classical baselines; future work will explore cross-dataset evaluation, incorporation of transformer-based or graph-based deep learning models, and data-driven explanation mechanisms to complement rule-based reasoning.

Mark Helpful

Bookmark

Relay

View Full Paper