April 27, 2022Open Access

Fine-Tuning Wav2Vec2 for Speaker Recognition

Key Points

Key points are not available for this paper at this time.

Abstract

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with cross-entropy or additive angular softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at github.com/nikvaessen/w2v2-speaker.

Bookmark

View Full Paper

Bookmark

View Full Paper

Fine-Tuning Wav2Vec2 for Speaker Recognition

Key Points

Abstract

Cite This Study