March 21, 2024Open Access

Enhancing Hate Speech Detection with Fine-Tuned Large Language Models Requires High-Quality Data

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Efforts to curb online hate speech depend on our ability to reliably detect it at scale. Previous studies have highlighted the strong zero-shot classification performance of large-language models (LLMs), offering a potential tool to efficiently identify harmful content. Yet for complex and ambivalent tasks like hate speech detection, pre-trained LLMs can be insufficient and carry systemic biases. Domain-specific models, fine-tuned for the given task and empirical context could help address these issues but, as we demonstrate, the quality of data used for fine-tuning decisively matters. In this study, we fine-tuned GPT-3.5 using a unique corpus of online comments annotated by diverse groups of coders with varying annotation quality: research assistants, activists, two kinds of crowd workers, and citizen scientists. We find that only annotations from those groups of annotators that are better than zero-shot GPT-3.5 in recognizing hate speech improve the classification performance of the fine-tuned LLM. Specifically, fine-tuning using the two most high quality annotator groups -- research assistants and Prolific crowd workers -- boosts classification performance by increasing the model's precision without notably sacrificing the good recall of zero-shot GPT-3.5. In contrast, low quality annotations do not improve or even decrease the ability to identify hate speech.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo