What question did this study set out to answer?

This research aims to improve multi-label chest X-ray classification by addressing pathology dependencies, class imbalance, and interpretability.

April 29, 2026Open Access

Pathology aware hierarchical transformers for multi-label thoracic disease classification using chest X-rays

Key Points

This research aims to improve multi-label chest X-ray classification by addressing pathology dependencies, class imbalance, and interpretability.
Developed a Hierarchical Pathology-aware Vision Transformer (HP-ViT) architecture.
Utilized Hierarchical Pathology-Aware Attention for co-occurrence modeling and Multi-Scale Feature Aggregation for abnormality detection.
Implemented Balanced Adaptive Focal Loss for optimized training focus on difficult classes.
HP-ViT achieved a macro-F1 of 0.924 and an exact match ratio of 0.842, representing significant improvements over existing methods (1.76%, 1.32% respectively).
The positive predictive value (PPV) was 0.925, indicating both high accuracy and reliability in predictions.
Statistical significance was established (p<0.001) through McNemar’s test.

Abstract

Multi-label chest X-ray classification faces three critical challenges: (i) inadequate modeling of inter-pathology dependencies despite clinical co-occurrence patterns, (ii) severe class imbalance (11. 2−47. 6%) causing minority-class underperformance, and (iii) limited interpretability hindering clinical trust. Existing methods address these challenges independently; no current framework jointly models pathology dependencies, imbalance-aware training, and interpretable attention. We propose a Hierarchical Pathology-aware Vision Transformer (HP-ViT), which jointly addresses these limitations in a unified architecture by employing: Hierarchical Pathology-Aware Attention (HPAA) for explicit disease co-occurrence modeling through two-stage token refinement, Multi-Scale Feature Aggregation (MSFA) for detecting localized and diffuse abnormalities across four hierarchical scales, and Balanced Adaptive Focal Loss (BAFL) implementing curriculum-scheduled focal modulation that progressively transitions from class-balanced to difficulty-focused training. Evaluated on COVIDx, ChestX-ray14, and BIMCV-COVID19+ (N=36, 904 images), HP-ViT achieves macro-F1 of 0. 924, exact match ratio of 0. 842, and PPV of 0. 925, representing 1. 76%, 1. 32%, and 1. 5% improvements over state-of-the-art, with statistical significance (p<0. 001, McNemar’s test on per-sample exact-match correctness). HP-ViT requires only 12. 6 M parameters (85% reduction vs. ViT-B/16) with 29. 8 ms inference time, enabling real-time clinical deployment. Interpretability evaluation yields 83. 7% mean SSIM between attention maps and radiologist annotations, confirming pathology-aligned localization.

Bookmark

View Full Paper

Bookmark

View Full Paper

Pathology aware hierarchical transformers for multi-label thoracic disease classification using chest X-rays

Key Points

Abstract

Cite This Study