This study proposes a methodological framework for extending Malware Information Sharing Platform (MISP) taxonomies in the domain of Dark Web drug forums through the integration of large language models (LLMs) and Human-in-the-Loop (HITL) validation. The research addresses the existing ontological gap between traditional MISP taxonomies, focused on technical or chemical indicators, and the linguistic and morphological complexity of illicit digital markets. By modelling the primary physical form as an ontological predicate with mutually exclusive values (for example, powder, pill–tablet–capsule, liquid, and plant-matter), the proposed approach captures the material dimension of the discourse, enhancing semantic disambiguation and forensic traceability. The Mistral 7B model was used in the morphology-classification stage conducted on a stratified analytical subset of 2904 drug-related Dark Web posts, extracted from a final corpus of 6456 posts after data cleaning and relevance filtering. In the first pass, 76.48% of posts were directly assigned to one of the base morphological categories, while 23.52% were labelled as unclear and subsequently reviewed through the HITL stage. Following HITL refinement and full reclassification, the proportion of posts labelled as unclear decreased from 23.52% to 11.29%, corresponding to a 51.99% relative reduction in ambiguity. Network visualisation with VOSviewer revealed three major discursive axes—recreational–commercial, pharmaceutical–opioid, and transnational–logistical—reflecting the hybrid semantic structure of digital drug markets. The results show that combining LLM-based inference with expert oversight improves the interpretability, reproducibility and ontological robustness of cyberintelligence models, offering a replicable framework for other sensitive domains such as terrorism or child exploitation.
Medina-Merodio et al. (Thu,) studied this question.