What question did this study set out to answer?

The aim is to improve phishing detection performance across different datasets using a novel framework.

March 29, 2026Open Access

Ensemble transfer learning for cross-dataset phishing detection

Key Points

The aim is to improve phishing detection performance across different datasets using a novel framework.
Developed an ensemble transfer learning framework called CDEPF.
Harmonized feature sets from two datasets using Principal Component Analysis (PCA).
Implemented an information-theoretic weighted fusion strategy for predictions.
Evaluated the framework on UCI Phishing Dataset and PhiUSIIL Phishing URL Dataset.
Achieved 94.4% cross-dataset accuracy, significantly surpassing the 57.4% baseline.
Demonstrated a 64.3% relative improvement with a p-value of <0.0001, indicating high statistical significance.
Provided a robust solution for cross-domain phishing detection, validated for practical deployment.

Abstract

Machine learning-based phishing detection models suffer from significant performance degradation when deployed across different datasets, a critical challenge that limits their real-world applicability. This research addresses this cross-dataset generalization problem by developing and validating a novel ensemble transfer learning framework designed to ensure robust performance in diverse operational environments. The proposed Cross-Domain Ensemble Probability Fusion (CDEPF) framework was evaluated using two heterogeneous datasets with zero feature overlap: the historical UCI Phishing Dataset and the modern PhiUSIIL Phishing URL Dataset. The methodology involves harmonizing these disparate feature sets into a unified 20-dimensional space using Principal Component Analysis (PCA) and integrating predictions through an information-theoretic weighted fusion strategy. Experimental results demonstrate that the CDEPF framework achieves a cross-dataset accuracy of 94.4%%, a substantial increase from the 57.4%% baseline performance. This represents a 64.3%% relative improvement, validated with high statistical significance ( p < 0 . 0001 ) and a large practical effect size. The framework provides a robust and deployment-ready solution that effectively bridges the performance gap in cross-domain phishing detection. This study contributes a validated methodological approach for domain adaptation in cybersecurity, enhancing the reliability of machine learning models against evolving cyber threats. Future work should explore multi-domain transfer architectures and real-world deployment validation. • Novel ensemble transfer learning framework achieving 94.4%% cross-dataset accuracy. • Comprehensive feature harmonization for zero-overlap datasets using PCA. • Statistical validation with 64.3% relative improvement over baseline. • Practical deployment-ready solution for cybersecurity applications.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Henry et al. (Sun,) studied this question.

synapsesocial.com/papers/69c8c115de0f0f753b39bbbf https://doi.org/https://doi.org/10.1016/j.fraope.2026.100579

Bookmark

View Full Paper