February 2, 2015

Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information

Key Points

Key points are not available for this paper at this time.

Abstract

In the field of speaker verification (SV) it is nowadays feasible and relatively easy to create a synthetic voice to deceive a speech driven biometric access system. This paper presents a synthetic speech detector that can be connected at the front-end or at the back-end of a standard SV system, and that will protect it from spoofing attacks coming from state-of-the-art statistical Text to Speech (TTS) systems. The system described is a Gaussian Mixture Model (GMM) based binary classifier that uses natural and copy-synthesized signals obtained from the Wall Street Journal database to train the system models. Three different state-of-the-art vocoders are chosen and modeled using two sets of acoustic parameters: 1) relative phase shift and 2) canonical Mel Frequency Cepstral Coefficients (MFCC) parameters, as baseline. The vocoder dependency of the system and multivocoder modeling features are thoroughly studied. Additional phase-aware vocoders are also tested. Several experiments are carried out, showing that the phase-based parameters perform better and are able to cope with new unknown attacks. The final evaluations, testing synthetic TTS signals obtained from the Blizzard challenge, validate our proposal.

Bookmark

Cite This Study

Sánchez et al. (Mon,) studied this question.

synapsesocial.com/papers/6a2008a38fbc0747110dc2f1 https://doi.org/https://doi.org/10.1109/tifs.2015.2398812

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark