What question did this study set out to answer?

The central aim is to develop a robust system for detecting deep fake audio using advanced neural networks.

March 26, 2026Open Access

A Systematic Review of Audio Deepfake Detection Using Hybrid Deep Models and Feature Fusion

Key Points

The central aim is to develop a robust system for detecting deep fake audio using advanced neural networks.
Utilized recurrent neural networks (RNNs) and long short term memory (LSTM) networks.
Employed audio feature extraction techniques including spectrograms and mel frequency cepstral coefficients (MFCCs).
Tested the detection models on various datasets containing real and deep fake audio samples.
Demonstrated effective classification of real and deep fake audio samples.
Provided insights into the model's robustness in real-life situations.
Highlighted the necessity for secure audio detection to protect privacy and integrity of digital communications.

Abstract

Deep fake audio is a term used to describe artificial or synthetic human, like voice generated by AI algorithms. This technology may lead to privacy issues and data security breaches in digital communication. Since most of the existing methods for detecting deep fake audios are not anymore able to compete with the new audio generation capabilities, the deep fake audio detection race has started. This article aims to set up a robust system to recognize deep fake audios through the use of Recurrent Neural Network (RNNs) and Long, Short Term Memory (LSTMs) networks. The method classifies real and fake audios with the help of two sophisticated audio feature extraction techniques, i.e., spectrograms and Mel, Frequency Cepstral Coefficients (MFCCs), by employing the model. The RNN and LSTM, based architectures proposed undergo testing through various datasets containing deep fake and real audio samples so that their effectiveness in real, life situations can be assured. The paper also highlights that it is important to use a deep fake audio detector to safeguard privacy, electronic communications, and audio evidence in the court. Its findings suggest the exploitation of the techniques of deep learning to overcome the threat of deep fake audio and develop the art.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper