Hate speech on social media within political discourse poses significant challenges due to its rapid dissemination and harmful impact on targeted individuals. Effective Natural Language Processing (NLP) techniques are essential for detecting and addressing hate speech online. This work compares a BERT-based model and several decoder transformer models for classifying hate speech in Brazilian Portuguese, specifically within the political context. To enhance model effectiveness, we employ several Cross-lingual Learning (CLL) strategies using linguistically similar languages, Italian and Spanish, and evaluate our model performance on a separate dataset not used for training. Our findings reveal that the encoder achieved an F1- score of 96.92%, and the best decoder model used attained an F1-score of 93.88%. Furthermore, the encoder model trained with CLL demonstrated superior performance on unseen data over previously published results, highlighting the potential of CLL with lexically close languages for hate speech detection in Portuguese.
Oliveira et al. (Thu,) studied this question.