Abstract Online gambling operators increasingly evade regulation by concealing promotional content within watermarked advertising images across social media and compromised domains. Traditional text-centric monitoring fails in these scenarios, particularly in multilingual environments where visual obfuscation masks critical semantic cues. This paper proposes a robust hybrid multimodal framework that explicitly models fine-grained interactions between OCR-extracted text and visual structures. Our architecture leverages a Vision Transformer (ViT) for spatial feature encoding and XLM-RoBERTa for cross-lingual semantic representation, integrated via a text-guided cross-modal attention (CMA) mechanism. This allows the model to "attend" to specific image regions based on extracted textual tokens, effectively uncovering hidden promotional signals. Testing on a newly curated dataset of 4,485 manually verified multilingual watermarked advertising images, the framework achieves an Accuracy of 0.9947 and an F1-score of 0.9947 with perfect recall (1.0000), consistently outperforming late-fusion and modern encoder-pair baselines (ViT+mBERT). Our findings reveal that while visual cues provide strong complementary discriminative signals, CMA ensures robustness against OCR noise and linguistic variation by achieving zero false negatives under full supervision. This study provides a scalable, high-precision solution for cross-border regulatory monitoring in adversarial digital ecosystems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Abdul Azzam Ajhari
University of Indonesia
Rizal Fathoni Aji
University of Indonesia
Aprinaldi Jasa Mantau
Journal of King Saud University - Computer and Information Sciences
University of South Australia
University of Indonesia
National Nuclear Energy Agency of Indonesia
Building similarity graph...
Analyzing shared references across papers
Loading...
Ajhari et al. (Sat,) studied this question.
synapsesocial.com/papers/69dc89183afacbeac03ead15 — DOI: https://doi.org/10.1007/s44443-026-00725-3