Perpetrators of cyber-hate are increasingly using code-switching, which is the alternation of languages within a single text, to evade automated moderation systems. While machine learning has advanced hate speech detection for monolingual content, these systems struggle to adapt to the widespread and complex reality of multilingual and low-resource communication. To address this gap, this study conducted a systematic literature review of 400 studies (2013–July 2025) to evaluate machine learning for detecting cyber-hate in code-switched texts. It identified key challenges: limited and unavailable datasets, an overreliance on bilingual data, a regional concentration of studies (notably India), underrepresentation of African and Latin American languages and narrow evaluation metrics. The analysis confirmed that while transformer-based models excel for high-resource languages like Hindi-English, their performance drops sharply for low-resource pairs like English-Swahili due to data scarcity and linguistic complexity. Synthesizing proposed solutions, the study concludes with a research roadmap prioritizing: (1) validated, open-access multilingual data; (2) models optimized for low-resource settings; (3) ethical safeguards for fairness and privacy; and (4) expanded evaluation metrics that include bias and interpretability. This study provides a diagnostic overview of the field and actionable guidance for building inclusive and context-aware hate speech detection systems. Its scope also extends to the related problems of toxicity and abusive language, which often overlap with cyber-hate and contribute to online hostility. Consequently, the review also explores how machine learning can be designed to detect and mitigate this broader spectrum of harmful content in multilingual and code-switched environments.
Mullah et al. (Tue,) studied this question.