Audit logs serve as a fundamental data source for system security. However, extracting high-value threat information from massive log data poses significant data management challenges: traditional unsupervised models produce high false positive rates due to their inherent assumption of equating statistical anomalies with malicious activities; Emerging Large Language Model (LLM) based solutions, despite their powerful semantic understanding capabilities, are constrained by high computational costs and context window lengths, making it difficult to detect attacks within massive logs. Furthermore, the outputs of existing methods differ significantly from the practical attack reports required by security analysts. To overcome these limitations, this paper proposes ANTEATER, an innovative end-to-end attack investigation framework based on raw logs that features a cascading ''filter-then-scrutinize'' architecture. The ''Filter'' stage is a lightweight, flow-based anomaly detection model that efficiently filters massive logs and reduces the data scale for investigation. Subsequently, the ''Scrutinize'' stage is an attack investigation model with a three-agent LLM collaboration. It operates on a provenance graph constructed from the filtered anomalous logs. The agents collaboratively and autonomously explore and reconstruct the attack subgraph, then generate a structured natural-language report. ANTEATER not only effectively mitigates the LLM bottleneck from cost and context window, enabling it to tackle long-term, stealthy attacks, but also bridges the critical gap between raw data detection and the generation of readable attack reports.
Ren et al. (Mon,) studied this question.