What question did this study set out to answer?

This research investigates the effect of data augmentation on improving vocal effort classification using self-supervised models.

May 14, 2026

Analyzing impact of data augmentation on vocal effort classification with self-supervised models

Key Points

This research investigates the effect of data augmentation on improving vocal effort classification using self-supervised models.
Explored various data augmentation strategies to enhance robustness in vocal effort classification.
Utilized two vocal effort corpora: VocalEffort-1&2 and AVID corpus, which include diverse vocal efforts.
Assessed classification performance improvements across several vocal effort categories.
Data augmentation significantly improved classification performance across vocal effort categories.
Self-supervised models showed enhanced generalization capabilities with augmented data.
Limited reliability persisted in current models without adequate labeled data coverage.

Abstract

Traditional speech and speaker recognition systems are typically trained using neutrally phonated datasets, where performance degrades significantly when speech deviates from this neutral state. Variations in vocal effort, ranging from whisper to shout, represent a critical but underexplored challenge for developing robust speech systems. To better address this issue, we investigate how various data augmentation strategies impact performance of models. While prior work on vocal effort classification has relied on traditional acoustic features and limited datasets, our study focuses on leveraging recent self-supervised models for categorical vocal effort recognition. Despite increasing attention, current models exhibit limited reliability in vocal effort classification, underscoring the need to improve modeling approaches. Considering the need for state-of-the-art performance and limited availability of labeled data that adequately covers the full range of vocal effort, data augmentation is needed to improve model generalization. Here, we explore the use of data augmentation to improve robustness in vocal effort classification. We apply a range of augmentation techniques to assess their impact on classification performance across vocal effort categories. Our experiments leverage two vocal effort corpora: VocalEffort-1&2 (VE-1,2), developed by CRSS-UTDallas, and AVID corpus, both spanning diverse vocal efforts. This study aims to uncover limitations and potential of augmentation techniques.

Mark Helpful

Bookmark

Relay