What question did this study set out to answer?

The aim is to reduce implicit bias in toxic speech detection tasks while ensuring model robustness against noisy data.

May 13, 2026Open Access

Mitigating Implicit Bias in Chinese Toxic Speech Detection via Unbiased Contrastive Learning

Key Points

The aim is to reduce implicit bias in toxic speech detection tasks while ensuring model robustness against noisy data.
Proposed Unbiased Contrastive Learning (UCL) for mitigating bias in text representations.
Implemented a contrastive learning objective to penalize sensitive attribute information.
Designed conditional normalization to address biased classification from imbalanced demographic distributions.
UCL outperformed state-of-the-art methods on both Chinese and English datasets with statistically significant improvements.
Experimental results showed enhanced robustness against noisy data compared to existing debiasing approaches.

Abstract

Mitigating human-like biases and social stereotypes in pre-trained language models (PLMs) has become a crucial task in Chinese toxic speech detection. While PLMs have achieved state-of-the-art results in mitigating explicit bias (e.g., bias on sensitive attribute words), the study of implicit bias (e.g., bias against specific demographic groups) is still under-explored. Besides, existing debiasing methods focus on the trade-off between debiasing efficiency and model performance, while ignoring robustness against noisy data. Therefore, a debiasing method that effectively reduces biases while maintaining robustness against noisy data is needed. In this paper, we propose Unbiased Contrastive Learning (UCL), which can mitigate explicit and implicit bias while maintaining robustness to noisy data. Specifically, we first analyze the bias representation problem constrained by contrastive learning objective and implement unbiased contrastive objective for learning unbiased text representations to mitigate explicit and implicit biases in Chinese toxic speech detection tasks. UCL inherits the idea of supervised contrastive learning, which encourages representations of the same sensitive attribute to be closer than those of different sensitive attributes and ensures unbiasedness by penalizing the sensitive attribute information contained in the representations. Furthermore, we design conditional normalization to reduce biased classification caused by the imbalanced distribution of demographic groups in the data. Experimental results on Chinese and English datasets show that the proposed method outperforms the state-of-the-art methods and achieves the competitive performance.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper