Efficient Personal Voice Activity Detection with Wake Word Reference Speech

Key Points

Key points are not available for this paper at this time.

Abstract

Personal voice activity detection (PVAD) is gradually used in speech assistants. Traditional PVAD schemes extract the target speaker's embedding from existing query reference speech through a pre-trained speaker verification model. Consequently, the performance of the PVAD model may suffer if the quality of the extracted speaker embedding is poor, such as when only utilizing wake word speech as the reference. In this work, we introduce a novel and efficient PVAD model. In contrast to conventional approaches that rely on speaker embeddings extracted from a pre-trained speaker verification model, our proposed method directly uses the raw frame-level features of the reference speech as the target speaker's attributes. In this way, our proposed model achieves an ultra-high recall rate, which is vital for speech assistant applications. The experimental results show the effectiveness of our proposed method in both cases of using existing query speech or wake word speech as reference.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Bang Zeng

Wuhan University

Ming Cheng

Sinopec (China)

Yao Tian

Nankai University

Actions

Institutions

University of Science and Technology of China

Wuhan University

Duke Kunshan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zeng et al. (Mon,) studied this question.

synapsesocial.com/papers/68e7388db6db6435876b1ae9 — DOI: https://doi.org/10.1109/icassp48485.2024.10446042

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

FiLM: Visual Reasoning with a General Conditioning Layer· 2018 · 1,610 citations
MUSAN: A Music, Speech, and Noise Corpus· 2015 · 922 citations
SVVAD: Personal Voice Activity Detection for Speaker Verification· 2023 · 7 citations
VoxCeleb2: Deep Speaker Recognition· 2018 · 2,250 citations
Voice activity detection using harmonic frequency components in likelihood ratio test· 2010 · 41 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

FiLM: Visual Reasoning with a General Conditioning Layer· 2018 · 1,610 citations
MUSAN: A Music, Speech, and Noise Corpus· 2015 · 922 citations
SVVAD: Personal Voice Activity Detection for Speaker Verification· 2023 · 7 citations
VoxCeleb2: Deep Speaker Recognition· 2018 · 2,250 citations
Voice activity detection using harmonic frequency components in likelihood ratio test· 2010 · 41 citations

Efficient Personal Voice Activity Detection with Wake Word Reference Speech

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider