What type of study is this?

September 5, 2025

Robust cybersecurity through discrete and large language models for effective phishing attack detection

Key Points

DistilBERT achieved a high F1-score of 99.992%, indicating exceptional performance in detecting phishing URLs.
BERT and DeBERTa yielded low F1-scores of 1.14% and 2.78%, respectively, despite exhibiting high precision on limited features.
Performance evaluation revealed RoBERTa's highest F1-score of 68.03%, suggesting better balance under constrained conditions.
The study demonstrates the critical role of adopting robust models across diverse scenarios for effective cybersecurity.

Abstract

This study explores the efficacy of four Large Language Models (LLMs)—BERT, DistilBERT, RoBERTa, and DeBERTa—in classifying URLs as either legitimate or phishing. The research methodology is structured into three phases: dataset processing, model fine-tuning, and performance evaluation. Each LLM is fine-tuned to distinguish between phishing and legitimate URLs. The models are evaluated using both a primary dataset with extensive features and an external dataset with minimal features to rigorously assess their robustness. The models consistently achieved high performance on the primary dataset, with AUC scores of 0.99, indicating near-perfect discrimination between phishing and legitimate URLs. DistilBERT excels in F1-score (99.992%), accuracy (99.991%), and precision (99.985%), showcasing its efficiency in real-world applications. BERT and DeBERTa also demonstrate excellent results, while RoBERTa, though slightly lower in precision, remains competitive. The model’s performance was also evaluated on an external test dataset containing 450,176 labeled URLs, the dataset helped access the model’s performance under extremely constrained conditions. BERT and DeBERTa showed low F1-scores (1.14% and 2.78%, respectively) despite high precision, indicating poor recall. DistilBERT performs moderately better with a 42.44% F1-score, while RoBERTa achieves the highest F1-score of 68.03%, suggesting superior balance between precision and recall under severe feature constraints. Overall, while all models exhibit strong performance with rich features, their ability to maintain efficacy under limited feature conditions varies. This study underscores the importance of developing models with robust performance across diverse scenarios, highlighting RoBERTa’s superior recall in feature-scarce environments and DistilBERT’s overall efficiency.

KI fragen

Bookmark

Cite This Study

Goyal et al. (Wed,) studied this question.

synapsesocial.com/papers/68bb5f266d6d5674bcd02fd0 https://doi.org/https://doi.org/10.47974/jdmsc-2162

KI fragen

Bookmark