The rapid expansion of digital infrastructures has intensified cybersecurity challenges, necessitating adaptive and intelligence-driven approaches to risk management. This study addresses the gap between exploratory LLM-based cybersecurity research and deployment-oriented evaluation under big-data SOC conditions. Instruction-tuned large language models (LLMs), namely GPT-3.5 Turbo and Mistral-7B, are evaluated for cyber threat detection by reformulating structured network telemetry into natural-language prompts. A deployment-aware evaluation framework is adopted, jointly assessing predictive performance, inference efficiency, and resource utilization. Experimental results indicate that Mistral-7B consistently outperforms GPT-3.5 Turbo and classical ML/DL baselines, achieving an overall accuracy of 0.9936 and a Cohen’s Kappa of 0.9925. Mistral-7B further achieves lower average inference latency (23.1 ms) and improved tail latency (P95 = 40.7 ms), while reducing peak memory usage during inference compared to GPT-3.5 Turbo. These results highlight the suitability of LLMs for scalable and explainable enterprise-level cybersecurity risk management.
Sorour et al. (Sun,) studied this question.