What question did this study set out to answer?

This research aims to develop effective methods for detecting hate speech in code-mixed languages, specifically addressing challenges in resource-constrained environments.

March 10, 2026

Detecting Hate Speech in Code‐Mixed Languages: A Robust Study Leveraging Data Augmentation, Transfer Learning, and Model Explainability Techniques in Resource‐Constrained Settings

Key Points

This research aims to develop effective methods for detecting hate speech in code-mixed languages, specifically addressing challenges in resource-constrained environments.
Utilized data augmentation techniques to balance the dataset.
Applied transfer learning methods to enhance model performance.
Assessed the proposed method using Malayalam-English code-mixed language.
Enhanced model explainability by visualizing neuron outputs.
Achieved a weighted F1 score exceeding 0.98, outperforming current advanced models.
Effectively addressed data imbalance in hate speech detection.
Showcased distinct neuron firing patterns for offensive vs. non-offensive inputs.

Abstract

ABSTRACT Hate speech is a harmful form of expression that promotes discrimination, hostility, and prejudice towards a specific group of individuals or communities. It has become increasingly essential to develop effective methods for detecting hate speech on online platforms to promote inclusivity and protect individuals from the negative effects of such speech. Despite its importance due to the rising use of multilingual social media platforms, hate speech/offensive language (HS/OL) detection in code‐mixed (CM) languages has not gotten the same level of attention from the research community as that of monolingual cases. CM languages pose a challenge due to mixing multiple languages within a single sentence or text, which leads to difficulties in text representation and language modeling. In addition, data imbalance is a common issue in HS/OL detection, as it is often a rare event, and the majority class tends to dominate the dataset. The present study suggests a new method for addressing these challenges by using data augmentation techniques to balance the dataset and leveraging transfer learning to improve model performance. The presented method was assessed in the Malayalam–English Code‐mix language, which poses additional challenges for HS/OL detection due to limited labeled data and lack of resources. The outcomes of the method indicate its effectiveness as the weighted F1 score exceeds 0.98, exceeding the effectiveness of the most advanced and up‐to‐date models. In addition to its effectiveness, the model's explainability is enhanced through a technique showcased in the paper that visualizes each neuron's output in the transfer models' last layer. This technique highlights the disparate firing patterns of the neurons when presented with offensive and non‐offensive inputs.

Bookmark

Cite This Study

Varma et al. (Tue,) studied this question.

synapsesocial.com/papers/69af958570916d39fea4d290 https://doi.org/https://doi.org/10.1111/coin.70204

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark