Key points are not available for this paper at this time.
Automatic Speaker Verification (ASV) is extensively used in many security-sensitive domains, but the increasing prevalence of adversarial attacks has seriously compromised the trustworthiness of these systems. Targeted black-box attacks emerge as the most formidable threat, proving incredibly challenging to counteract. However, existing defenses exhibit limitations when applied in real-world scenarios. We propose VoiceDefense - a novel adversarial sample detection method that slices an audio sample into multiple segments and captures their local audio features with segment-specific ASV scores. These scores present distributions that vary distinctly between genuine and adversarial samples, which VoiceDefense leverages for detection. VoiceDefense outperforms the state of the art with a best AUC of 0.9624 and is consistently effective against various attacks and perturbation budgets, all while maintaining remarkably low computational overhead.
Kan et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: