Cyber-enabled fraud has grown substantially more sophisticated over the last decade, with phishing emerging as one of the most damaging and widely deployed attack vectors in the digital threat landscape. This study introduces PhishGuard AI, a detection framework that fuses traditional supervised learning with the cross-lingual capabilities of Multilingual BERT (mBERT) to flag phishing attempts written in any of 104 languages. A fundamental shortcoming of current commercial and academic systems is their near-exclusive reliance on English text and pre-catalogued threat signatures, rendering them blind to freshly launched campaigns and non-English content. To overcome these constraints, PhishGuard AI employs a single unified pipeline that accepts both raw URLs and full email messages as input, extracting a 21-dimensional set of structural and live behavioural indicators while simultaneously deriving 50-dimensional language embeddings through principal component reduction of the bert-base-multilingual-cased CLS vector. Experimental validation on 11,055 balanced samples under a rigorous 10-fold stratified cross-validation regime yielded a Random Forest accuracy of 97.60% and an AUC-ROC of 0.978; a soft-voting ensemble of three classifiers pushed accuracy further to 97.90%. Deployment is realised through a Flask REST API serving a browser extension, a web dashboard, and an early-stage email gateway prototype.
Shejwal et al. (Thu,) studied this question.