What question did this study set out to answer?

The central aim is to enhance the detection of malicious Thai-English code-mixed scripts while minimizing false positives.

March 12, 2026Open Access

Beyond Semantic Noise: A Dual-Verification Framework for Thai–English Code-Mixed Malicious Script Detection via XAI-Guided Selective Integration

Key Points

The central aim is to enhance the detection of malicious Thai-English code-mixed scripts while minimizing false positives.
Proposed a hybrid architecture using Explainable AI (XAI) with Selective Integration.
Mathematically formalized the fusion of context-aware embeddings with structural statistics.
Utilized Dempster-Shafer Theory and Conditional Mutual Information for improved detection mechanisms.
Validated the model on a high-fidelity corpus for accuracy assessment.
Achieved a state-of-the-art F1-score of 0.9908, outperforming existing models.
Demonstrated a very low false positive rate of 0.0116.
Revealed a Dual-Validation mechanism enhancing detection reliability.

Abstract

In the evolving cybersecurity landscape, detecting Thai-English code-mixed malicious scripts within high-trust domains such as governmental and academic portals presents a significant defensive challenge. While Transformer-based architectures excel in semantic parsing, they often exhibit ‘Structural Bias,’ misinterpreting the high-entropy syntax of benign legacy HyperText Markup Language (HTML) as malicious obfuscation due to inherent ‘Attention Deficit’ in token-limited models. To address this, we propose an Explainable AI (XAI)-Driven Hybrid Architecture grounded in a ‘Selective Integration’ strategy. Unlike traditional hybrid models, our framework mathematically formalizes the fusion process by synergizing context-aware WangChanBERTa embeddings with orthogonal structural statistics through Dempster-Shafer Theory and Conditional Mutual Information (CMI). The proposed model was validated on a high-fidelity corpus, achieving a state-of-the-art F1-score of 0.9908, significantly outperforming standalone Transformers, Random Forest, and unsupervised baselines. XAI diagnostics revealed a ‘Dual-Validation’ mechanism where structural features act as an epistemic anchor. This mechanism effectively triggers a ‘Semantic Veto’ to filter hallucinations caused by benign complexity, achieving a remarkably low False Positive Rate (FPR) of 0.0116. Our findings demonstrate that hybridization is most effective when engineered features provide mathematical orthogonality to semantic embeddings. This work offers a robust, theoretically grounded framework for securing critical digital infrastructures in low-resource linguistic environments.

Bookmark

View Full Paper

Bookmark

View Full Paper

Beyond Semantic Noise: A Dual-Verification Framework for Thai–English Code-Mixed Malicious Script Detection via XAI-Guided Selective Integration

Key Points

Abstract

Cite This Study