What question did this study set out to answer?

This study investigates the efficacy of unimodal versus multimodal deep learning models for diagnosing chest diseases.

March 4, 2026Open Access

Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis

Key Points

This study investigates the efficacy of unimodal versus multimodal deep learning models for diagnosing chest diseases.
Developed twelve models using ResNet50, EfficientNetB3, and DenseNet121 architectures.
Utilized unimodal (image-only) and multimodal (image plus clinical data) configurations.
Conducted experiments on two versions of the NIH Chest X-ray Dataset with 5606 and 121,120 samples respectively.
Evaluated model performance using Area Under the Receiver Operating Characteristic Curve (AUROC) metrics.
Multimodal fusion consistently outperformed unimodal approaches across all architectures.
Improvements in detection were more notable in large-scale datasets.
Increased data volume improved model generalization and reduced performance variance, especially for rare diseases.

Abstract

Background/Objectives: Early and accurate diagnosis of chest diseases is a critical challenge in clinical practice, particularly in scenarios where multiple pathologies may coexist. While deep learning-based medical image analysis has shown promising results, most existing studies rely on unimodal data and fixed-scale datasets, limiting their generalizability and clinical relevance. In this study, we present a comprehensive comparative analysis of unimodal and multimodal deep learning models for multi-label chest disease classification using chest X-ray images and associated clinical metadata. Methods: A total of twelve models were developed based on three widely used convolutional neural network architectures—ResNet50, EfficientNetB3, and DenseNet121—under both unimodal (image-only) and multimodal (image + clinical data) configurations. To systematically investigate the impact of data scale, experiments were conducted on two distinct versions: the Random Sample of NIH Chest X-ray Dataset and the NIH Chest X-ray Dataset, containing 5606 and 121,120 samples, respectively. Model performance was evaluated using label-based Area Under the Receiver Operating Characteristic Curve (AUROC) metrics. Results: Experimental results demonstrate that multimodal fusion consistently outperforms unimodal approaches across all architectures and data scales, with more pronounced improvements observed in large-scale settings. Furthermore, increasing data volume leads to improved generalization and reduced performance variance, particularly for rare pathologies. Conclusions: These findings highlight the effectiveness of multimodal, multi-label learning in enhancing diagnostic accuracy and support the development of robust clinical decision support systems for chest disease assessment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Diğdem Orhan

Fırat University

Murat Uçan

Dicle University

R. Alhajj

University of Calgary

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Scalable Unimodal and Multimodal Deep Learning for Multi-Label Chest Disease Detection: A Comparative Analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study