What question did this study set out to answer?

This research aims to develop an explainable AI framework to improve insider threat detection in security operations centers.

April 30, 2026Open Access

Illuminating the Shadows: An Explainable AI-Driven Approach with Ensemble Learning for Insider Threat Detection

Key Points

This research aims to develop an explainable AI framework to improve insider threat detection in security operations centers.
Integrated ensemble learning models, including Random Forest and behavior feature engineering.
Applied LLM-driven filtering using dataset metadata and MITRE ATT&CK logic for data refinement.
Utilized SHAP and LIME for explainability, focusing on weekly analysis due to computational constraints.
Proposed framework enhances detection capabilities while reducing false positives and improving analyst trust.
Demonstrated dual-layer explainability architecture that provides both global and instance-level insights.

Abstract

In response to the increasing complexity of insider threats, this study proposes an explainable AI-driven framework designed to emulate real-world analyst workflows in security operations centers (SOCs). The framework integrates ensemble learning models—Random Forest, XGBoost, and Stacking—with behavioral feature engineering across multiple temporal granularities (session, daily, and weekly), enabling both fine-grained detection and long-term behavioral analysis. The framework follows a structured pipeline in which LLM-driven filtering is first applied to refine behavioral data using dataset metadata and MITRE ATT&CK-aligned logic, followed by ensemble learning for detection, explainability through SHAP and LIME, and LLM-based interpretation for analyst-oriented insights. A key contribution of this work is a dual-layer explainability architecture, where SHAP values capture global feature importance and LIME values provide instance-level explanations, enhanced by LLM-generated interpretations aligned with the MITRE ATT&CK framework. Due to computational constraints, modeling, full SHAP/LIME explainability, and LLM-guided filtering are applied at the weekly level. This design enables stable and interpretable behavioral analysis, while finer-grained analysis at daily and session levels remains part of future work. The filtering logic simulates SOC playbook-based automation using dataset metadata and MITRE-aligned patterns, reflecting how large-scale behavioral data are handled in practice. Despite the absence of contextual telemetry such as Security Information and Event Management (SIEM), Data Loss Prevention (DLP), or network logs, the proposed pipeline produces transparent and prioritized alerts that reduce false positives and improve analyst trust. Future work will extend the framework to finer temporal granularities—particularly daily and session levels—by applying the same pipeline to ensure consistency across analysis levels, in addition to exploring semi-supervised learning to adapt to evolving insider threat tactics.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper