December 1, 2016

Deep neural network-based speaker embeddings for end-to-end speaker verification

Key Points

Key points are not available for this paper at this time.

Abstract

In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Snyder et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1774701723722a886ea653 — DOI: https://doi.org/10.1109/slt.2016.7846260

Authors

David Snyder

ECRI Institute

Pegah Ghahremani

Amazon (United States)

Daniel Povey

Xiaomi (China)

Actions

Institutions

Johns Hopkins University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Deep neural network-based speaker embeddings for end-to-end speaker verification

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion