In the evolving cybersecurity landscape, detecting Thai-English code-mixed malicious scripts within high-trust domains such as governmental and academic portals presents a significant defensive challenge. While Transformer-based architectures excel in semantic parsing, they often exhibit ‘Structural Bias,’ misinterpreting the high-entropy syntax of benign legacy HyperText Markup Language (HTML) as malicious obfuscation due to inherent ‘Attention Deficit’ in token-limited models. To address this, we propose an Explainable AI (XAI)-Driven Hybrid Architecture grounded in a ‘Selective Integration’ strategy. Unlike traditional hybrid models, our framework mathematically formalizes the fusion process by synergizing context-aware WangChanBERTa embeddings with orthogonal structural statistics through Dempster-Shafer Theory and Conditional Mutual Information (CMI). The proposed model was validated on a high-fidelity corpus, achieving a state-of-the-art F1-score of 0.9908, significantly outperforming standalone Transformers, Random Forest, and unsupervised baselines. XAI diagnostics revealed a ‘Dual-Validation’ mechanism where structural features act as an epistemic anchor. This mechanism effectively triggers a ‘Semantic Veto’ to filter hallucinations caused by benign complexity, achieving a remarkably low False Positive Rate (FPR) of 0.0116. Our findings demonstrate that hybridization is most effective when engineered features provide mathematical orthogonality to semantic embeddings. This work offers a robust, theoretically grounded framework for securing critical digital infrastructures in low-resource linguistic environments.
Teppap et al. (Mon,) studied this question.