Deepfakes, synthetic multimedia files generated by artificial intelligence, are drastically undermining digital credibility. Their ability to manipulate our perception of reality has created a new and complex battleground for disinformation, posing a critical threat to non-English-speaking audio with distinctive accents. Consequently, the objective of this study is to determine the human capacity to detect deepfake audio in Spanish with a Paraguayan accent through an experiment conducted with an Android application called ReFake (developed specifically for this research). In this experiment, 450 participants, aged 16–72, evaluated 10 audio samples of up to 15 s each, classifying them as authentic (belonging to Paraguayan journalists) or fake (generated with ElevenLabs). The findings suggests that human ear is more accurate than artificial intelligence (AI) at detecting vocal ‘naturalness’. This ability is influenced by generational age and educational level, with younger people and those with postgraduate degrees demonstrating greater performance. Conversely, gender and nationality do not influence detection, although the high prosodic quality of deepfakes still leads to errors in human judgment. Given these results, it is crucial to adapt and develop new strategies for a secure and resilient online ecosystem.
Ramos et al. (Sat,) studied this question.