Abstract Advanced Persistent Threats (APTs) is one of the most serious cybersecurity threats today, posing a substantial threat to enterprises and organizations due to their stealthy and targeted nature. Data provenance-based methods are widely used for APT detection but often rely on specific rules and high-quality data due to limitations in capturing complete graph structures, reducing their effectiveness in diverse detection environments. To overcome this issue, we propose APT-HERA, a model employs heterogeneous graph representation learning to learn system behavior patterns that can adapt to environments with limited data. The embedding representations of the provenance graph in APT-HERA are derived from both homophily and heterogeneity perspectives, thereby enabling a more comprehensive extraction of the rich structural information contained within the provenance graph. The performance of APT-HERA was evaluated on four public datasets. Experimental results demonstrate that APT-HERA achieves 98% precision in information-constrained detection scenarios, outperforming state-of-the-art methods including MAGIC, Flash, and ThreaTrace under such conditions.
Liu et al. (Thu,) studied this question.