What type of study is this?

This is a Quantitative Study study.

October 3, 2025Open Access

Multi-Channel Spectro-Temporal Representations for Speech-Based Parkinson’s Disease Detection

Key Points

Fusion of time-frequency representations enhances sensitivity to speech impairments in parkinson's disease.
EfficientNet-B2 model achieved high accuracy (84.39%) and F1-score (84.35%) across architectures.
Use of multiple channels in speech analysis significantly outperforms single representations.
Emotionally salient and prosodically emphasized speech improves discriminability and overall detection performance.

Abstract

Early, non-invasive detection of Parkinson’s Disease (PD) using speech analysis offers promise for scalable screening. In this work, we propose a multi-channel spectro-temporal deep-learning approach for PD detection from sentence-level speech, a clinically relevant yet underexplored modality. We extract and fuse three complementary time–frequency representations—mel spectrogram, constant-Q transform (CQT), and gammatone spectrogram—into a three-channel input analogous to an RGB image. This fused representation is evaluated across CNNs (ResNet, DenseNet, and EfficientNet) and Vision Transformer using the PC-GITA dataset, under 10-fold subject-independent cross-validation for robust assessment. Results showed that fusion consistently improves performance over single representations across architectures. EfficientNet-B2 achieves the highest accuracy (84.39% ± 5.19%) and F1-score (84.35% ± 5.52%), outperforming recent methods using handcrafted features or pretrained models (e.g., Wav2Vec2.0, HuBERT) on the same task and dataset. Performance varies with sentence type, with emotionally salient and prosodically emphasized utterances yielding higher AUC, suggesting that richer prosody enhances discriminability. Our findings indicate that multi-channel fusion enhances sensitivity to subtle speech impairments in PD by integrating complementary spectral information. Our approach implies that multi-channel fusion could enhance the detection of discriminative acoustic biomarkers, potentially offering a more robust and effective framework for speech-based PD screening, though further validation is needed before clinical application.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hadi Sedigh Malekroodi

Bionics Institute

Nuwan Madusanka

Pukyong National University

Byeong-Il Lee

Pukyong National University

Journals

Journal of Imaging

Actions

Institutions

Pukyong National University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-Channel Spectro-Temporal Representations for Speech-Based Parkinson’s Disease Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study