Los puntos clave no están disponibles para este artículo en este momento.
Given the fact that we are in an era where cyber-security threats have gotten increasingly advanced, it is essential to have strong prediction and detection systems that can distinguish between safe and harmful websites. This paper provides an innovative method of dealing with this situation, using different Machine Learning techniques, to efficiently identify malicious URLs. Using Python, the proposed method uses a well-selected dataset consisting of 20,000 URLs collected from 3 sources, and 60 features extracted from each URL, with an exact balance of 50% phishing URLs and 50% legitimate URLs. The main aim of the paper is to develop a machine learning based system that accurately classify URLs, contributing to achieving higher level of security. In this regard, the paper investigated the effectiveness of Random Forests (RFs), Decision Trees (DTs), Support Vector Machines (SVMs), k-Nearest Neighbors (KNNs), Logistic Regression, and Artificial Neural Networks (ANN). The experimental results show that system has an excellent performance. The test accuracy of the Random Forest Classifier reached 99% demonstrating its ability to separate legitimate and malicious URLs. In addition, ANN achieved an accuracy of 98%. Overall, five of the six tested algorithms reported accuracy greater than or equal 94.5 %.
Hani et al. (Tue,) studied this question.