Key points are not available for this paper at this time.
In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.
Building similarity graph...
Analyzing shared references across papers
Loading...
David Snyder
ECRI Institute
Pegah Ghahremani
Amazon (United States)
Daniel Povey
Xiaomi (China)
Johns Hopkins University
Building similarity graph...
Analyzing shared references across papers
Loading...
Snyder et al. (Thu,) studied this question.
synapsesocial.com/papers/6a1774701723722a886ea653 — DOI: https://doi.org/10.1109/slt.2016.7846260