What question did this study set out to answer?

The aim is to develop a framework leveraging large language models to improve the extraction and verification of cyber threat indicators.

March 25, 2026Open Access

LLM-Powered Proactive Cyber-Defense Framework Using Cyber-Threat Indicators Collected from X Platform

Puntos clave

The aim is to develop a framework leveraging large language models to improve the extraction and verification of cyber threat indicators.
Integrated large language models across the CTI pipeline
Utilized data augmentation and hybrid classification for analysis
Implemented expert-in-the-loop validation for operational reliability
Conducted experimental evaluations on model robustness and precision
Assessed technology acceptance factors through regression analysis
Achieved a precision rate of 98.87% in false-positive reduction
Improved model robustness and generalization through LLM-driven data augmentation
Validated IoC reports showed strong alignment and high semantic adequacy
Identified that perceived usefulness and trust in automation are key predictors of technology acceptance
Expert evaluations confirmed the linguistic quality of augmented samples

Resumen

Security organizations increasingly rely on cyber threat intelligence (CTI) sharing to enhance their resilience against cyberattacks. Indicators of Compromise (IoCs) play a critical operational role in CTI by providing malicious artifacts that support threat detection, incident response, and facilitate proactive defense. However, the rapid growth of social media as CTI sources, characterized by short-text content, poses significant challenges to automated IoC extraction, contextual interpretation, operational integration, and reliable verification. To address these challenges, this study proposes a comprehensive framework that integrates Large Language Models (LLMs) across multiple stages of the CTI pipeline. The framework leverages LLM-driven data augmentation, a hybrid classification model, and contextual summarization to enhance short-text understanding while supporting expert-in-the-loop validation for operational reliability. Extensive experimental evaluations demonstrate that LLM-driven data augmentation substantially improves model robustness and generalization while reducing false-positive alerts, achieving a precision of 98.87%. Quantitative diversity analysis and expert-based human evaluation further confirm the linguistic quality and correctness of the generated augmented samples. In addition, IoC reports are validated using both reference-based and reference-free evaluation metrics that show strong alignment and high semantic adequacy. Moreover, a technology acceptance model was integrated with cybersecurity domain constructs to assess the acceptance factors of the proposed framework. Regression analysis showed that perceived usefulness, behavioral intention, trust in automation, and risk were the strongest predictors of actual use. These predictors are commonly interpreted as indicators of technology acceptance.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo