Key points are not available for this paper at this time.
This article presents a comparative analysis of political bias in the outputs of three Large Language Model (LLM)-based chatbots - ChatGPT, Bing Chat, and Bard - in response to political queries concerning the authoritarian regime in Russia. We investigate whether safeguards implemented in these chatbots contribute to the censorship of information that is viewed as harmful by the regime, in particular information about Vladimir Putin and the Russian war against Ukraine, and whether these safeguards enable the generation of false claims, in particular in relation to the regime's internal and external opponents. To detect whether LLM safeguards reiterate political bias, the article compares the outputs of prompts focusing on Putin's regime and the ones dealing with the Russian opposition and the US and Ukrainian politicians. It also examines whether the degree of bias varies depending on the language of the prompt and compares outputs concerning political personalities and issues across three languages: Russian, Ukrainian, and English. The results reveal significant disparities in how individual chatbots withhold politics-related information or produce false claims in relation to it. Notably, Bard consistently refused to respond to queries about Vladimir Putin in Russian, even when the relevant information was accessible via Google Search, and generally followed the censorship guidelines that, according to Yandex-related data leaks, were issued by the Russian authorities. In terms of false claims, we find substantial variation across languages with Ukrainian and Russian prompts generating false information more often and Bard being more prone to produce false claims in relation to Russian regime opponents (e.g., Navalny or Zelenskyy) than other chatbots. This research aims to stimulate further dialogue and research on developing safeguards against the misuse of LLMs outside of democratic environments.
Urman et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: