What question did this study set out to answer?

The central aim is to develop an efficient framework for classifying pulmonary diseases based on chest X-ray images.

April 17, 2026Open Access

An Informatics-Driven Hybrid Vision Transformer–DenseNet Framework with Attention Mechanisms for Scalable Chest X-Ray–Based Pulmonary Disease Classification

Puntos clave

The central aim is to develop an efficient framework for classifying pulmonary diseases based on chest X-ray images.
Designed an informatics-driven hybrid deep learning architecture called VD-MHA Net.
Integrated a Vision Transformer branch for global feature representation with a DenseNet121 backbone for local feature extraction.
Employed a multi-head attention-based module to enhance feature fusion and prioritization.
Evaluated the framework on a multi-source dataset featuring diverse diagnostic categories.
VD-MHA Net achieved robust diagnostic performance across four categories: normal, COVID-19, pneumonia, and lung opacity.
Demonstrated a mean area under the curve (AUC) and a balanced accuracy, alongside improvements in F1-score and Matthews correlation coefficient (MCC).
Exhibited stable convergence and high computational scalability.
Highlighting the focus on clinically relevant regions through SHAP-based explainability analysis.

Resumen

The increasing volume of medical imaging data in modern healthcare systems has created significant challenges for efficient, accurate, and scalable diagnostic decision support. Chest X-ray (CXR) imaging, as one of the most widely used radiological modalities, requires robust informatics-oriented analytical frameworks capable of integrating heterogeneous visual patterns while maintaining computational efficiency and interpretability. In this study, we present an informatics-driven hybrid deep learning framework, termed VD-MHA Net, designed for scalable pulmonary disease classification from CXR images. The proposed architecture integrates a Vision Transformer (ViT) branch for global contextual representation with a DenseNet121 backbone for localized structural feature extraction. To effectively combine heterogeneous representations, a lightweight multi-head attention-based fusion module is employed, enabling adaptive feature prioritization within a unified decision-making pipeline. The framework was evaluated on a curated multi-source dataset comprising chest X-ray images across four diagnostic categories: normal, COVID-19, pneumonia, and lung opacity. Experimental results demonstrate that VD-MHA Net achieves robust diagnostic performance, with a mean AUC of , a balanced accuracy of , an F1-score of , and an MCC of , while exhibiting stable convergence and computational scalability. Furthermore, SHAP-based explainability analysis confirms that the proposed system focuses on clinically relevant pulmonary regions, enhancing transparency and trust in clinical decision-support environments. Overall, this study highlights the potential of hybrid attention-driven architectures as practical medical informatics solutions for large-scale radiographic analysis and supports their integration into data-driven clinical workflows.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo