Mitigating human-like biases and social stereotypes in pre-trained language models (PLMs) has become a crucial task in Chinese toxic speech detection. While PLMs have achieved state-of-the-art results in mitigating explicit bias (e.g., bias on sensitive attribute words), the study of implicit bias (e.g., bias against specific demographic groups) is still under-explored. Besides, existing debiasing methods focus on the trade-off between debiasing efficiency and model performance, while ignoring robustness against noisy data. Therefore, a debiasing method that effectively reduces biases while maintaining robustness against noisy data is needed. In this paper, we propose Unbiased Contrastive Learning (UCL), which can mitigate explicit and implicit bias while maintaining robustness to noisy data. Specifically, we first analyze the bias representation problem constrained by contrastive learning objective and implement unbiased contrastive objective for learning unbiased text representations to mitigate explicit and implicit biases in Chinese toxic speech detection tasks. UCL inherits the idea of supervised contrastive learning, which encourages representations of the same sensitive attribute to be closer than those of different sensitive attributes and ensures unbiasedness by penalizing the sensitive attribute information contained in the representations. Furthermore, we design conditional normalization to reduce biased classification caused by the imbalanced distribution of demographic groups in the data. Experimental results on Chinese and English datasets show that the proposed method outperforms the state-of-the-art methods and achieves the competitive performance.
Feng et al. (Mon,) studied this question.