Deep fake audio is a term used to describe artificial or synthetic human, like voice generated by AI algorithms. This technology may lead to privacy issues and data security breaches in digital communication. Since most of the existing methods for detecting deep fake audios are not anymore able to compete with the new audio generation capabilities, the deep fake audio detection race has started. This article aims to set up a robust system to recognize deep fake audios through the use of Recurrent Neural Network (RNNs) and Long, Short Term Memory (LSTMs) networks. The method classifies real and fake audios with the help of two sophisticated audio feature extraction techniques, i.e., spectrograms and Mel, Frequency Cepstral Coefficients (MFCCs), by employing the model. The RNN and LSTM, based architectures proposed undergo testing through various datasets containing deep fake and real audio samples so that their effectiveness in real, life situations can be assured. The paper also highlights that it is important to use a deep fake audio detector to safeguard privacy, electronic communications, and audio evidence in the court. Its findings suggest the exploitation of the techniques of deep learning to overcome the threat of deep fake audio and develop the art.
Yadav et al. (Tue,) studied this question.