Key points are not available for this paper at this time.
Personal voice activity detection (PVAD) is gradually used in speech assistants. Traditional PVAD schemes extract the target speaker's embedding from existing query reference speech through a pre-trained speaker verification model. Consequently, the performance of the PVAD model may suffer if the quality of the extracted speaker embedding is poor, such as when only utilizing wake word speech as the reference. In this work, we introduce a novel and efficient PVAD model. In contrast to conventional approaches that rely on speaker embeddings extracted from a pre-trained speaker verification model, our proposed method directly uses the raw frame-level features of the reference speech as the target speaker's attributes. In this way, our proposed model achieves an ultra-high recall rate, which is vital for speech assistant applications. The experimental results show the effectiveness of our proposed method in both cases of using existing query speech or wake word speech as reference.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bang Zeng
Wuhan University
Ming Cheng
Sinopec (China)
Yao Tian
Nankai University
University of Science and Technology of China
Wuhan University
Duke Kunshan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zeng et al. (Mon,) studied this question.
synapsesocial.com/papers/68e7388db6db6435876b1ae9 — DOI: https://doi.org/10.1109/icassp48485.2024.10446042
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: