Key points are not available for this paper at this time.
In the digital era, online platforms serve as crucial hubs for social interactions and idea exchange. However, these platforms are continually shadowed by toxic comments that undermine genuine discourse and have the potential to harm participants. While machine learning provides an avenue for detecting such toxic content, a significant challenge arises when these models, influenced by biased training datasets, inadvertently propagate or amplify inherent biases. Such unintentional biases are especially disconcerting when they disadvantage or misrepresent identities already vulnerable in online spaces. Addressing this complex landscape, our research presents a model meticulously designed to detect toxic comments, aiming to achieve a higher degree of accuracy while striving to minimize such unintended biases. Our approach is underpinned by a combination of a tailored data preprocessing technique and the integration of Long Short-Term Memory networks (LSTM) with Attention mechanisms. Preliminary evaluations reveal our model's AVC score to be 0.93524, indicating its efficacy in toxicity detection. While there's always room for improvement, the design and results of our model emphasize the importance and feasibility of developing more nuanced and unbiased machine learning solutions for the challenges posed in the digital domain.
Dai et al. (Tue,) studied this question.