Provenance-based Intrusion Detection Systems (IDSs) model the causal relationships between security events through a provenance graph and learn contextual information to detect Advanced Persistent Threats (APTs) effectively. However, existing provenance graph representation methods fail to fully reflect the characteristics of security domain data and the semantic information embedded in system logs, resulting in limited learning efficiency and detection accuracy. This paper proposes a provenance representation method that effectively captures security context from system log data. The proposed method improves the performance of provenance-based IDSs by combining (1) a provenance graph construction technique that transforms meaningful string attributes—such as command lines, process names, and file paths—into vector representations to extract semantic information in the security context, (2) a hybrid time–position embedding technique for capturing causal relationships between events, and (3) an iterative refinement learning strategy tailored to the characteristics of system log data. Experimental results using the DARPA Transparent Computing Engagement 3 (E3) benchmark dataset for APT detection demonstrate that our method achieves improved accuracy compared to existing approaches while significantly accelerating convergence during iterative training. These results suggest that the proposed embedding technique can more effectively capture abnormal temporal patterns, such as the long dwell times characteristic of APT attacks.
Gong et al. (Sat,) studied this question.