Background: The rise of hate speech on social media, especially during the COVID-19 pandemic, poses serious threats to psychological well-being and social cohesion. While automated detection tools exist, they often lack the ability to grasp context and cultural nuances. This study explores the integration of Critical Discourse Analysis to enhance the accuracy and fairness of toxic language detection on digital platforms. Aim: This study aims to examine toxic language on social media by integrating an automated detection method based on machine learning with Critical Discourse Analysis (CDA), in order to understand how hate speech is produced, disseminated, and normalized within digital spaces. Method: This study employs a qualitative-critical design. Data were collected by crawling public posts on social media platforms (Twitter and Facebook) using specific keywords. The screening of toxic language was performed using a BERT-based machine learning classification model. From the automatic detection results, 200 posts were purposively selected for further analysis using CDA, focusing on text structure, discursive practices, and social practices. Result: The results reveal that 25.67% of the 15,000 posts analyzed were classified as toxic language. The CDA analysis uncovered that much of the toxic language did not appear explicitly but was instead concealed through irony, humor, and metaphor. The most prevalent targets of hate speech were racial issues (45%), followed by religion (28%), gender (15%), and sexual orientation (12%). Social media serves not only as a medium for individual dissemination but also as an arena for the reproduction of discriminatory ideologies. Conclusion: This study makes methodological contributions to the development of fairer and more contextual digital content moderation systems and provides a foundation for policymakers to implement more effective regulations aimed at protecting digital spaces from hate speech.
Kusuma et al. (Sun,) studied this question.