May 23, 2024

Speech Audio Deepfake Detection via Convolutional Neural Networks

LVLucas P. Valente MSMarcelo Marques Simões de SouzaUniversidade Federal do Ceará ARAlan Marques da RochaUniversidade Federal do Ceará

Key Points

Key points are not available for this paper at this time.

Abstract

The production of artificial media content brings on ethical, legal and social implications for journalism, education, entertainment and industry. Software tools are currently available for anyone who intent to maliciously generate or tamper with digital audio voices. In this context, detecting voice authenticity is important to avoid the consequences of its criminal use. Here, we propose the application of convolutional neural networks (CNN) and Mel spectograms in detection of artificially generated voices. Supervised experiments with speech samples signals, collected from several voice datasets, were conducted to find the best CNN topology that performs the detection, in terms of accuracy, regardless of the language spoken. The best accuracy scores found are: 99% for the FoR dataset, 94% for the ASV and 98% for the WaveFake. Training the model with all datasets together, and testing with individual datasets, yields accuracies of 98% for the FoR base, 92% for the ASV and 96% for WaveFake. These results are compatible with those found in state-of-the-art, proving the viability of the model.

Perguntar à IA

Bookmark

View Full Paper