Key points are not available for this paper at this time.
The production of artificial media content brings on ethical, legal and social implications for journalism, education, entertainment and industry. Software tools are currently available for anyone who intent to maliciously generate or tamper with digital audio voices. In this context, detecting voice authenticity is important to avoid the consequences of its criminal use. Here, we propose the application of convolutional neural networks (CNN) and Mel spectograms in detection of artificially generated voices. Supervised experiments with speech samples signals, collected from several voice datasets, were conducted to find the best CNN topology that performs the detection, in terms of accuracy, regardless of the language spoken. The best accuracy scores found are: 99% for the FoR dataset, 94% for the ASV and 98% for the WaveFake. Training the model with all datasets together, and testing with individual datasets, yields accuracies of 98% for the FoR base, 92% for the ASV and 96% for WaveFake. These results are compatible with those found in state-of-the-art, proving the viability of the model.
Building similarity graph...
Analyzing shared references across papers
Loading...
Valente et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e68be2b6db64358761339b — DOI: https://doi.org/10.1109/eais58494.2024.10569111
Lucas P. Valente
Marcelo Marques Simões de Souza
Alan M. da Rocha
Building similarity graph...
Analyzing shared references across papers
Loading...