November 25, 2022

Deep learning methods for malicious URL detection using embedding techniques as Logistic Regression with Lasso penalty and Random Forest

Key Points

Key points are not available for this paper at this time.

Abstract

In the Internet ecosystem, URLs (Uniform Resource Locators) are widely used to propagate malicious infections through spam, spear-phishing, drive-by-download exploitation, malware embedding, etc. The blacklisting, pattern analysis, and signature-matching approaches are widely used to detect such threats. However, these techniques are very effective in detecting known types of malicious URLs but unable to detect new types of attacks launched by the malicious URLs. Moreover, traditional machine learning-based malicious URL detection is highly dependent upon manual feature engineering, which is a costly approach. , This research study proposes deep learning-based malicious URL detection with less dependent on manual feature engineering to overcome the above-mentioned challenges. The Natural Language Processing (NLP) technique, i.e., Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer, along with the N-gram parameter, is used for extracting the features. Afterward, two feature selection techniques as Embedded methods are used - Logistic Regression with L1 (Lasso) penalty and Random Forest Selection. Neural network models such as Deep Neural Networks (DNN), Long Short-Term Memory (LSTM), and Convolution Neural Networks (CNN) are constructed to detect malicious URLs. The experiment results demonstrate that the DNN model gives the best performance results as 96.95% accuracy, 99% precision, 100% recall, and 99% F1-score with logistic regression along with the L1 (Lasso) penalty as a feature selection method. Finally, this study compares state-of-art implemented deep learning methods for malicious URL detection.

Mark Helpful

Bookmark

Relay