What question did this study set out to answer?

The research aims to evaluate the effectiveness of data augmentation techniques in enhancing speech-based detection of Parkinson's disease.

February 8, 2026Open Access

On the Suitability of Data Augmentation Techniques to Improve Parkinson’s Disease Detection with Speech Recordings

Key Points

The research aims to evaluate the effectiveness of data augmentation techniques in enhancing speech-based detection of Parkinson's disease.
Developed a data augmentation methodology for speech classification.
Utilized a deep learning model trained on Mel spectrograms.
Applied augmentation techniques at both waveform and time-frequency levels.
Conducted internal validation and tested on an independent dataset.
Data augmentation techniques improved classification accuracy by up to 3%.
Improvements in accuracy did not consistently translate to better generalization on independent datasets.

Abstract

Background: Parkinson’s disease (PD) is a neurodegenerative disorder that affects millions of people worldwide. Speech analysis has emerged as a non-invasive tool for automatic PD detection; however, the scarcity and homogeneity of available datasets often limit the generalization capability of machine learning models, motivating the use of data augmentation strategies to improve robustness. Methods: This study presents a data augmentation-based methodology for speech-based classification between PD patients and healthy control subjects. A deep learning model trained from scratch on Mel spectrograms is evaluated using augmentation techniques applied at both the waveform and time–frequency levels. Multiple training and model selection strategies are analyzed and model performance is assessed through internal validation as well as using an independent dataset Results: Experimental results show that carefully selected data augmentation techniques improve classification performance with respect to the non-augmented counterpart, achieving gains of up to 3% in accuracy. However, when evaluated on an independent dataset, these improvements do not consistently translate into better generalization. Conclusions: These findings demonstrate that, while data augmentation can effectively enhance model performance within a single dataset, this apparent robustness is not sufficient to guarantee generalization on independent speech corpora for PD detection.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Cristian David Ríos-Urrego

Universidad de Antioquia

Tulio Andrés Ruiz-Romero

David Puerta-Lotero

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

On the Suitability of Data Augmentation Techniques to Improve Parkinson’s Disease Detection with Speech Recordings

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study