Phishing sites are increasingly causing harm to consumers, commercial enterprises, and the online infrastructure. Online safety is dependent on how well these evil sites can be detected in time and correctly. A number of solutions that exist are based on lexical, token features, or structural hints. Although useful to a certain degree, these methods tend to lose more contextual meaning in URLs. This paper presents SemanticPhishNet, a hybrid detection system that uses semantic knowledge to detect phishing attacks by utilizing semantic knowledge via a transformer-based system to process HTML documents and URL information to produce accurate and efficient phishing detections. The architecture uses MiniLM (identical type as distillbert) to obtain contextual embeddings of cleaned HTML and augmented text of URLs and a simple dense classifier to perform effective binary classification. A stratified three-way split of data was used to evaluate the model with real-world obfuscation patterns like replacement of “http” by “hxxp”. The experimental findings show that SemanticPhishNet has high performance in various measures, outperforming other state-of-the-art models in accuracy, recall and generalization ability. We conduct experiments on cross-validation and external validation with independent data. The framework exhibits good performance (96–97% cross-validation accuracy) and external evaluation demonstrates realistic generalization (67% accuracy), albeit revealing the difficulties of domain shift in phishing. The proposed model performs better than many of the existing models in the real world. The confusion matrices and ROC analysis indicate that the phishing and benign classes are consistently separated in both the validation and test sets. The findings show that the suggested model is efficient, stable, and scalable to the present-day phishing detection. The paper stresses the importance of appropriate evaluation techniques, such as leakage-aware splits and cross-dataset evaluation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Emad Ul Haq Qazi
Naif Arab University for Security Sciences
Muhammad Hamza Faheem
Abdulrazaq Almorjan
Naif Arab University for Security Sciences
Computers
Naif Arab University for Security Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...
Qazi et al. (Sun,) studied this question.
synapsesocial.com/papers/6a1539ccb5d9c58d83e8cdef — DOI: https://doi.org/10.3390/computers15060335