June 27, 2024Open Access

Information Technology for Recognizing Propaganda, Fakes and Disinformation in Textual Content Based on NLP and Machine Learning Methods

Key Points

Key points are not available for this paper at this time.

Abstract

Context. The research is aimed at the application of artificial intelligence for the development and improvement of means of cyber warfare, in particular for combating disinformation, fakes and propaganda in the Internet space, identifying sources of disinformation and inauthentic behavior (bots) of coordinated groups. The implementation of the project will contribute to solving the important and currently relevant issue of information manipulation in the media, because in order to effectively fight against distortion and disinformation, it is necessary to obtain an effective tool for recognizing these phenomena in textual data in order to develop a further strategy to prevent the spread of such data. Objective of the study is to develop or automatic recognition of political propaganda in textual data, which is built on the basis of machine learning with a teacher and implemented using natural language processing methods. Method. Recognition of the presence of propaganda will occur at two levels: at the general level, that is, at the level of the document, and at the level of individual sentences. To implement the project, such feature construction methods as the TF-IDF statistical indicator, the “Bag of Words” vectorization model, the marking of parts of speech, the word2vec model for obtaining vector representations of words, as well as the recognition of trigger words (reinforcing words, absolute pronouns and “shiny” words). Logistic regression was used as the main modeling algorithm. Results. Machine learning models have been developed to recognize propaganda, fakes and disinformation at the document (article) and sentence level. Both model scores are satisfactory, but the model for document-level propaganda recognition performed almost 1.2 times better (by 20%). Conclusions. The created model shows excellent results in recognizing propaganda, fakes and disinformation in textual content based on NLP and machine learning methods. The analysis of the raw data showed that the propaganda recognition model at the document (article) level was able to correctly classify 6097 non-propaganda articles and 694 propaganda articles. 123 propaganda articles and 285 non-propaganda articles were misclassified. The obtained estimate of the model: 0.9433254618697041. The sentence-level propaganda recognition model successfully classified 205 propaganda articles and 1917 non-propaganda articles. The model score is: 0.7437784787942516 (but 731 articles were incorrectly classified).

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Victoria Vysotska

Osnabrück University

Journals

Radio Electronics Computer Science Control

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Information Technology for Recognizing Propaganda, Fakes and Disinformation in Textual Content Based on NLP and Machine Learning Methods

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider