Chronic obstructive pulmonary disease (COPD) is a prevalent respiratory disease, and early diagnosis is crucial for timely intervention and improved prognosis. Respiratory sound analysis, with its non-invasive nature and ability to reflect airway pathology, shows great potential as an auxiliary diagnostic tool. However, existing methods often focus on detecting specific abnormal sounds, such as wheezing and crackling, rather than diagnosing diseases directly, Additionally, most approaches rely on single features or architectures, which limits diagnostic accuracy. To address these issues, this paper proposes a multi-scale hybrid deep learning model that combines Convolutional Neural Network (CNN), Bidirectional Long Short-term Memory networks (BiLSTM), and Vision Transformer (ViT) to capture temporal, spatial, and global contextual features from both raw signals and multi-scale Mel spectrograms. A Multi-Scale Dynamic Fusion (MSDF) module further integrates these features to enhance representation, while achieving a balance between model complexity and performance. The model achieves accuracies of 99.23% on the ICBHI database and 98.48% on the KAUH/RespiratoryDatabase@TR hybrid database, demonstrating strong potential for effective clinical COPD diagnosis.
Dong et al. (Thu,) studied this question.