Multi-label disease diagnosis in chest X-rays necessitates simultaneous consideration of both global organ structures and local lesion characteristics. However, current methodologies primarily utilize single-branch architectures and lack effective attention guidance mechanisms, which complicates the balance between global context and local details. Furthermore, multi-label datasets for chest X-rays often suffer from significant class imbalance. We propose CR-MSNet, a dual-branch multi-scale attention network designed for multi-label chest X-ray classification. The global branch is constructed using CoAtNet-2-rw to capture holistic semantic representations, while the local branch employs a residual convolutional neural network to extract detailed lesion features. We incorporate a cross-attention mechanism to facilitate adaptive interaction and information exchange between global and local representations. Additionally, we propose a Parallel Multi-Scale Channel-Spatial Attention (PMS-CSA) module to enhance both key semantic channels and potential lesion regions, thereby increasing the discriminative power of feature representations. A two-stage training strategy with an adjusted loss function is implemented to effectively alleviate the detrimental effects of class imbalance on model performance. Experimental results indicate that CR-MSNet achieves a macro-average AUC of 0.847 on the ChestX-ray14 dataset, confirming its effectiveness and potential for application in multi-label classification tasks for chest X-rays. By seamlessly integrating a dual-branch architecture with multi-scale attention mechanisms, this study confirms the critical role of attention-guided feature interactions in reconciling global and local representations.
Wang et al. (Mon,) studied this question.