What question did this study set out to answer?

The aim is to develop a deep learning framework for classifying multiple thoracic diseases in chest X-rays using hybrid architectures and ensemble methods.

April 22, 2026Open Access

An Adaptive Deep Learning Framework for Multi-Label Chest X-Ray Diagnosis Using a Hybrid CNN–Transformer Architecture and Class-Wise Ensemble Fusion

Puntos clave

The aim is to develop a deep learning framework for classifying multiple thoracic diseases in chest X-rays using hybrid architectures and ensemble methods.
Developed a hybrid CNN-transformer architecture for multi-label classification.
Conducted validation on multiple public datasets including NIH ChestX-ray14 and CheXpert.
Evaluated performance metrics such as AUROC, precision, and recall, with statistical analysis for significance.
Achieved a mean AUROC of 0.8495 on the NIH internal test set; improved further to 0.8577 with ensemble methods.
Outperformed baseline DenseNet121 in both internal and external validations, with significant AUROCs at 0.6500 and 0.6592 for CheXpert and ChestX-Det10 respectively.
Showed improvement in diagnosing important conditions like cardiomegaly and pneumothorax through per-class analysis.

Resumen

Background/Objectives: To develop and externally evaluate a deep learning framework for multi-label thoracic disease classification on chest radiographs using hybrid convolutional neural network (CNN)–transformer architectures, hierarchical scalar-weighted fusion, and ensemble strategies. Methods: This retrospective, multi-center study utilized publicly available datasets: NIH ChestX-ray14 (112,120 images; 30,805 patients) for model development and internal testing, and CheXpert (223,415 images) plus ChestX-Det10 (3578 images) for external validation. Nine CNN–transformer hybrids were systematically benchmarked, and the proposed model incorporated multi-scale DenseNet121 features, scalar-weighted fusion, positional encodings, and cross-attention. Four post hoc ensemble methods were explored, including a class-wise Top-3 Grid Search. Performance was evaluated using AUROC as the primary metric, along with precision, recall, F1-score, accuracy, specificity, positive predictive value, and negative predictive value. Statistical comparisons were performed using bootstrapped resampling and appropriate parametric or non-parametic tests. Results: On the NIH internal test set, the proposed hybrid model achieved a mean AUROC of 0.8495, which was significantly higher than that of the DenseNet121 baseline (0.8441, p = 0.032). The Top-3 Grid Search ensemble further improved internal performance, achieving a mean AUROC of 0.8577 (p < 0.00001). On external validation, the ensemble consistently outperformed DenseNet121, achieving mean AUROCs of 0.6500 on CheXpert (p < 0.001) and 0.6592 on ChestX-Det10 (p < 0.001). Per-class analysis revealed significant improvements for clinically important conditions such as cardiomegaly, mass, and pneumothorax. Grad-CAM visualizations demonstrated the strong alignment of predicted abnormalities with radiologically relevant regions. Conclusions: This CNN–transformer framework, particularly when combined with class-wise ensemble strategies, provided modest but statistically significant improvements in multi-label chest X-ray classification. External validation suggested partial generalizability across datasets, although performance remained moderate under domain shift.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo