August 10, 2023

Offensive Language Detection in Social Media Using Ensemble Techniques

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Hate speech and offensive content are deliberate attacks directed at a group or society based on their characteristics like religion, gender, or race, and pose a threat to society. The usage of Facebook and other social media platforms, WhatsApp, and Instagram led to an increase in this type of content. Detecting and preventing the spread of such content is crucial to avoid negative impacts on society. However, manually analyzing these vast and continuously rising texts is challenging as well as tedious. This work covers the usage of machine learning techniques to recognize offensive speech on social media. A comparative analysis of several popular algorithms, including Naive Bayes (NB), Support Vector Machines (SVM), Extreme Gradient Boosting (XGBoost), K Nearest Neighbor (KNN), Decision Tree, and Random Forest has been performed, and evaluate their performance using recall, precision, F1- score, and accuracy metrics. The experiments on a dataset of Twitter posts containing hate speech show that Random Forest is the most effective algorithm, achieving an accuracy of 97%. The effectiveness of the model has also been examined in relation to several feature extraction methodologies, such as bag-of-words and TF-IDF(term frequency-inverse document frequency). The obtained results show the effectiveness of Random Forest combined with TF-IDF as the best approach in detecting vile language in social media.

Preguntar a la IA

Me gusta

Guardar