Key points are not available for this paper at this time.
In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.
Building similarity graph...
Analyzing shared references across papers
Loading...
Snyder et al. (Thu,) studied this question.
synapsesocial.com/papers/6a1774701723722a886ea653 — DOI: https://doi.org/10.1109/slt.2016.7846260
David Snyder
ECRI Institute
Pegah Ghahremani
Amazon (United States)
Daniel Povey
Xiaomi (China)
Johns Hopkins University
Building similarity graph...
Analyzing shared references across papers
Loading...