What does this research mean for the field?

Machine learning systems for detecting cyber-hate in code-switched texts face significant challenges due to data scarcity and linguistic complexity, particularly for low-resource language pairs. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to evaluate the effectiveness of machine learning in detecting cyber-hate within code-switch texts and identify key challenges.

February 26, 2026Open Access

Machine learning intervention on cyber-hate in code-switch texts: a systematic review with open challenges and solutions

Key Points

This study aims to evaluate the effectiveness of machine learning in detecting cyber-hate within code-switch texts and identify key challenges.
Conducted a systematic literature review of 400 studies from 2013 to July 2025.
Assessed the performance of existing models for multilingual hate speech detection.
Identified gaps in available datasets and evaluation methods.
Non-performance of systems decreased for low-resource language pairs like English-Swahili.
Key challenges included dataset limitations and a regional focus on India.
Proposed solutions involved creating open-access data and developing models suited for low-resource languages.

Abstract

Perpetrators of cyber-hate are increasingly using code-switching, which is the alternation of languages within a single text, to evade automated moderation systems. While machine learning has advanced hate speech detection for monolingual content, these systems struggle to adapt to the widespread and complex reality of multilingual and low-resource communication. To address this gap, this study conducted a systematic literature review of 400 studies (2013–July 2025) to evaluate machine learning for detecting cyber-hate in code-switched texts. It identified key challenges: limited and unavailable datasets, an overreliance on bilingual data, a regional concentration of studies (notably India), underrepresentation of African and Latin American languages and narrow evaluation metrics. The analysis confirmed that while transformer-based models excel for high-resource languages like Hindi-English, their performance drops sharply for low-resource pairs like English-Swahili due to data scarcity and linguistic complexity. Synthesizing proposed solutions, the study concludes with a research roadmap prioritizing: (1) validated, open-access multilingual data; (2) models optimized for low-resource settings; (3) ethical safeguards for fairness and privacy; and (4) expanded evaluation metrics that include bias and interpretability. This study provides a diagnostic overview of the field and actionable guidance for building inclusive and context-aware hate speech detection systems. Its scope also extends to the related problems of toxicity and abusive language, which often overlap with cyber-hate and contribute to online hostility. Consequently, the review also explores how machine learning can be designed to detect and mitigate this broader spectrum of harmful content in multilingual and code-switched environments.

Bookmark

View Full Paper

Bookmark

View Full Paper

Machine learning intervention on cyber-hate in code-switch texts: a systematic review with open challenges and solutions

Key Points

Abstract

Cite This Study