What question did this study set out to answer?

The research aims to automate the detection of online disinformation through machine learning techniques.

April 19, 2026Open Access

Online Disinformation Detection through Text Analysis: A Comparative Study of Supervised Models with Hyperparameter Optimization

Key Points

The research aims to automate the detection of online disinformation through machine learning techniques.
Conducted a comparative analysis of supervised learning models including logistic regression, SVMs, random forests, and gradient boosting.
Used a real-world dataset of textual content from digital platforms, applying TF-IDF for preprocessing.
Implemented hyperparameter optimization using Grid Search to enhance model performance.
Evaluated models through training and test data separation to estimate reliability.
Achieved accuracy values exceeding 98% across all models.
Gradient Boosting model demonstrated the best performance with excellent generalization.
Random Forest model had a perfect AUC but showed signs of potential overfitting.
Hyperparameter optimization significantly improved the performance of traditional models.

Abstract

The rapid spread of disinformation on social media poses a major challenge in the digital age, with significant impacts on public opinion and decision-making. In this context, this study proposes a machine learning-based approach for the automatic detection of online disinformation. A comparative analysis is conducted on several supervised learning models, including logistic regression, support vector machines (SVMs), random forests, and gradient boosting. The experiment is based on a real-world dataset of textual content from digital platforms, preprocessed using TF-IDF. Furthermore, hyperparameter optimization, primarily using Grid Search, is implemented to improve model performance. The results obtained reveal very high performance for all models, with accuracy values exceeding 98% and areas under the ROC curve (AUC) close to 1. The Gradient Boosting model stands out as the best performer, offering an excellent balance between accuracy and generalization capabilities, while the Random Forest model, although exhibiting a perfect AUC, shows potential signs of overfitting. This study highlights the effectiveness of machine learning methods for disinformation detection and underscores the importance of hyperparameter optimization in improving model performance. It also opens up interesting avenues for integrating more advanced techniques, including deep learning and multimodal analysis, into disinformation countermeasures systems. The models were evaluated using data separation into training and test sets, allowing for a reliable estimation of their performance. The results show that hyperparameter optimization significantly improves the performance of classical models. However, certain limitations related to the diversity of data sources and methodological choices must be taken into account. Graphical Summary

Online Disinformation Detection through Text Analysis: A Comparative Study of Supervised Models with Hyperparameter Optimization

Key Points

Abstract

Cite This Study