December 1, 2015

Time delay deep neural network-based universal background models for speaker recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Recently, deep neural networks (DNN) have been incorporated into i-vector-based speaker recognition systems, where they have significantly improved state-of-the-art performance. In these systems, a DNN is used to collect sufficient statistics for i-vector extraction. In this study, the DNN is a recently developed time delay deep neural network (TDNN) that has achieved promising results in LVCSR tasks. We believe that the TDNN-based system achieves the best reported results on SRE10 and it obtains a 50% relative improvement over our GMM baseline in terms of equal error rate (EER). For some applications, the computational cost of a DNN is high. Therefore, we also investigate a lightweight alternative in which a supervised GMM is derived from the TDNN posteriors. This method maintains the speed of the traditional unsupervised-GMM, but achieves a 20% relative improvement in EER.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

David Snyder

ECRI Institute

Daniel Garcia-Romero

Johns Hopkins University

Daniel Povey

Xiaomi (China)

Actions

Institutions

Johns Hopkins University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Time delay deep neural network-based universal background models for speaker recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study