Computer-aided diagnosis relies heavily on the automatic classification of thoracic diseases from chest X-ray (CXR) images, yet this task remains challenging due to class imbalance, overlapping radiological features, and high inter-class similarity. In this study, two architectures MedViT and Hybrid CNN–ViT are adapted and evaluated, which are a scalable Vision Transformer (ViT)-based architecture designed for multi-label thoracic disease classification. MedViT is enhanced with transfer learning, domain-specific augmentations, and self-attention mechanisms to capture subtle pathological patterns across diverse conditions. The Hybrid CNN–ViT is the combination of strength of CNN and ViT which is admirable in capturing local patterns. Both models are trained and validated on two benchmark datasets, NIH ChestX-ray14 and CheXpert, and compared against state-of-the-art baselines. On the NIH ChestX-ray14 dataset, MedViT showed strong performance with 93.34% accuracy and a macro AUROC of 94.17%, while the Hybrid CNN–ViT model reached 85.81% accuracy and 72.28% macro AUROC. On the CheXpert dataset, MedViT achieved 79.22% accuracy and a macro AUROC of 75.11%, whereas Hybrid CNN–ViT achieved 76.15% accuracy and 71.68% macro AUROC. These results show that MedViT performs well and generalizes effectively across different datasets. Per-label analysis demonstrated robust precision and recall even for under-represented conditions such as fibrosis and hernia, where existing models typically show significant performance drops. Unlike earlier methods that often struggle with generalization, MedViT maintains a balanced trade-off between sensitivity and specificity across all categories. These findings highlight the effectiveness of Transformer-based feature encoding in capturing subtle spatial correlations in medical imaging, while also setting new benchmarks for automated thoracic disease classification. The MedViT model outperformed the state-of-the-art methods and shows strong potential to support radiologists in decision-making and improve diagnostic workflows in clinical practice.
Building similarity graph...
Analyzing shared references across papers
Loading...
Victor Mawutor Agbo
Marwadi University
Ruchi Patel
Rutgers, The State University of New Jersey
Munindra Lunagaria
Marwadi University
Scientific Reports
Barkatullah University
Manipal University Jaipur
Marwadi University
Building similarity graph...
Analyzing shared references across papers
Loading...
Agbo et al. (Fri,) studied this question.
synapsesocial.com/papers/69db37404fe01fead37c545a — DOI: https://doi.org/10.1038/s41598-026-43282-5