March 19, 2025Open Access

Explainable TabNet ensemble model for identification of obfuscated URLs with features selection to ensure secure web browsing

Key Points

Key points are not available for this paper at this time.

Abstract

Obfuscated and malicious URLs may lead to harmful content or actions to the system, such as downloading malware, phishing, scams, or adware. In the domain of cybersecurity, the identification of the obfuscated Uniform Resource Locator (URL) is a concerning facet. This study proposes a Robust unified TabNet ensemble model for the identification of Malicious URLs with feature extraction based on the computation of features' importance for classification. A fine-tuned attention-based deep neural network TabNet is used to extract the features of the URL. The customized data with the most important features is generated, and a Machine Learning (ML) ensemble model is developed for the classification of the URLs. The evaluation parameters accuracy, Precision, Recall, and F1-score are measured to look at the performance of the TabNet ensemble model. Accuracy of 97.8%, precision of 0.978, recall of 0.976, and F1-score of 0.978 reflect the outperforming results of the proposed model while classifying the five URL classes. The model is further validated through statistical analysis by measuring the Kappa value, which comes up as 0.968 for the proposed model. With a 10-fold cross-validation model, we attained a mean accuracy of 97.27% and a confidence interval of 0.004. The Local Interpretable Model-agnostic Explanations (LIME) explainable AI model is used to validate the model to perceive the contributing features towards the classification model. The results are compared with the state-of-the-art ML classifiers and the previous studies, and the whole validation process favors the proposed model's efficacy.

KI fragen

Bookmark

View Full Paper