ABSTRACT Insider threats remain one of the most challenging issues in cybersecurity, as malicious activities are carried out by legitimate users and are difficult to distinguish from normal behavior. The rarity of insider events further leads to highly imbalanced datasets, reducing the effectiveness of conventional rule‐based, machine learning, and deep learning approaches, which often suffer from low precision and high false positive rates. This work proposes an insider threat detection framework based on Extreme Gradient Boosting (XGBoost) optimized with Bayesian Optimization (BO). Class imbalance is addressed using Synthetic Minority Oversampling Technique with Edited Nearest Neighbors (SMOTEENN). The framework is further strengthened through feature engineering to capture behavioral and temporal patterns of user activity. The proposed methodology is assessed on Carnegie Mellon University's (CMU) CERTr4.2 synthetic dataset, where single‐day sequential activity logs are processed to obtain numerical feature vectors. The model is trained on r4.2 and subsequently evaluated not only on r4.2 but also tested for generalization on the newer r5.2 and r6.2 datasets. Performance is measured under both balanced and imbalanced configurations across different data ratios. The results consistently demonstrate that feature engineering significantly improves detection capability. In particular, when evaluated on r4.2, the model achieves 99.0% accuracy, 96.2% precision, 96.9% recall, 96.6% F1‐score, and a ROC‐AUC of 99.7%. Comparable robustness is observed on r5.2 and r6.2, confirming the reliability and transferability of the approach across datasets. These findings establish the clear advantage of the proposed framework over current baseline models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ambairam Muthu Sivakrishna
R. Mohan
Valaparla Rohini
Security and Privacy
National Institute of Technology Tiruchirappalli
Building similarity graph...
Analyzing shared references across papers
Loading...
Sivakrishna et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68f83321d24b29c969481f30 — DOI: https://doi.org/10.1002/spy2.70122