The rapid spread of disinformation on social media poses a major challenge in the digital age, with significant impacts on public opinion and decision-making. In this context, this study proposes a machine learning-based approach for the automatic detection of online disinformation. A comparative analysis is conducted on several supervised learning models, including logistic regression, support vector machines (SVMs), random forests, and gradient boosting. The experiment is based on a real-world dataset of textual content from digital platforms, preprocessed using TF-IDF. Furthermore, hyperparameter optimization, primarily using Grid Search, is implemented to improve model performance. The results obtained reveal very high performance for all models, with accuracy values exceeding 98% and areas under the ROC curve (AUC) close to 1. The Gradient Boosting model stands out as the best performer, offering an excellent balance between accuracy and generalization capabilities, while the Random Forest model, although exhibiting a perfect AUC, shows potential signs of overfitting. This study highlights the effectiveness of machine learning methods for disinformation detection and underscores the importance of hyperparameter optimization in improving model performance. It also opens up interesting avenues for integrating more advanced techniques, including deep learning and multimodal analysis, into disinformation countermeasures systems. The models were evaluated using data separation into training and test sets, allowing for a reliable estimation of their performance. The results show that hyperparameter optimization significantly improves the performance of classical models. However, certain limitations related to the diversity of data sources and methodological choices must be taken into account. Graphical Summary
Katadi et al. (Wed,) studied this question.