Key points are not available for this paper at this time.
This paper aims to classify URLs and web pages into legitimate and malicious sites to alert users and allow safer browsing through the internet. Through this process we have found various points of interest and attributes that bring to light the characteristics of these malicious sources, allowing us to be aware of and prevent any damage it might cause. These attributes relate to the domain registration of the URLs, the URL text, the structure of the web page and its contents. The application of models such as BERT, LSTM, Decision Trees and their amalgamation as an ensemble result in a pragmatic solution to the problem in the form of an ensemble giving an accuracy of 95.3%. It also uses concepts such as web page reputation, Internal Links and External Links of a web page. The method of classification used in this paper where both Natural Language Processing techniques and Machine Learning models with such a vast variety of features have been combined has not been implemented earlier. We conclude the paper by suggesting methods to improve to solve the problem.
Venugopal et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: