Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance. At the same time, proprietary large language models (LLMs) have demonstrated strong performance in phishing-related classification tasks, but their operational costs and reliance on external providers limit their practical adoption in many business environments. This paper presents a detection pipeline for malicious websites and investigates the feasibility of Small Language Models (SLMs) using raw HTML code and URLs. A key advantage of these models is that they can be deployed on local infrastructure, providing organisations with greater control over data and operations. We systematically evaluate 15 commonly used SLMs, ranging from 1 billion to 70 billion parameters, benchmarking their classification accuracy, computational requirements, and cost-efficiency. Our results highlight the trade-offs between detection performance and resource consumption. While SLMs underperform compared to state-of-the-art proprietary LLMs, the gap is moderate: the best SLM achieves an F1-score of 0.893 (Llama3.3:70B), compared to 0.929 for GPT-5.2, indicating that open-source models can provide a viable and scalable alternative to external LLM services.
Goldenits et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: