What question did this study set out to answer?

This research aims to enhance network intrusion detection systems with explainability and actionable reporting using LLMs.

April 10, 2026Open Access

An LLM-Based Agentic Network Traffic Incident-Report Approach Towards Explainable-AI Network Defense

Key Points

This research aims to enhance network intrusion detection systems with explainability and actionable reporting using LLMs.
Developed a graph-based multi-agent framework integrating ensemble machine learning and LLM-generated reports.
Utilized Random Forest for pre-detection with 99.49% accuracy.
Implemented ensemble classification with MLP, Random Forest, and XGBoost using soft voting and SHAP for interpretability.
Employed an evidence-grounded incident report generation approach combining classification and external threat intelligence.
Achieved ensemble accuracy exceeding 99.8% across 11 attack classes.
Generated incident reports with perfect groundedness scores of 1.0, ensuring claims are contextually accurate.
Improved UDP Flood F1 score from 48% with MLP alone to 95% using the ensemble approach.

Abstract

Traditional intrusion detection systems for IoT networks achieve high classification accuracy but lack interpretability and actionable incident-response capabilities, limiting their operational value in security-critical environments. This paper presents a graph-based multi-agent framework that integrates ensemble machine learning with Large Language Model (LLM)-powered incident report generation via Retrieval-Augmented Generation (RAG). The system employs a three-phase architecture: (1) a lightweight Random Forest binary pre-detection, achieving 99.49% accuracy with a 6 MB model size for edge deployment; (2) ensemble classification combining Multi-Layer Perceptron, Random Forest, and XGBoost with soft voting and SHAP-based feature attribution for explainability; and (3) a ReAct-based summary agent that synthesizes classification results with external threat intelligence from Web search and scholarly databases to generate evidence-grounded incident reports. To address the challenge of evaluating non-deterministic LLM outputs, we introduce custom RAG evaluation metrics—faithfulness and groundedness implemented via the LLM-as-Judge framework. Experimental validation on the ACI IoT Network Dataset 2023 demonstrates ensemble accuracy exceeding 99.8% across 11 attack classes; perfect groundedness scores (1.0), indicating all generated claims derive from the retrieved context; and moderate faithfulness (0.64), reflecting appropriate analytical synthesis. The ensemble approach mitigates individual model weaknesses, improving the UDP Flood F1 score from 48% (MLP alone) to 95% through soft voting. This work bridges the gap between high-accuracy detection and trustworthy, actionable security analysis for automated incident-response systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

An LLM-Based Agentic Network Traffic Incident-Report Approach Towards Explainable-AI Network Defense

Key Points

Abstract

Cite This Study