What question did this study set out to answer?

The aim was to create an explainable AI model to accurately detect pneumothorax using lung ultrasound images and evaluate its performance against expert clinicians.

April 19, 2026Open Access

Explainable transfer learning ensemble AI model for lung ultrasound pneumothorax detection with expert benchmark

Key Points

The aim was to create an explainable AI model to accurately detect pneumothorax using lung ultrasound images and evaluate its performance against expert clinicians.
Developed a soft-voting ensemble model trained on 1,856 ultrasound clips from diverse sources
Ensured model interpretability through visualization techniques and heatmaps validated by clinicians
Benchmarking against 11 experienced clinicians using a balanced independent test set
Conducted statistical analyses measuring sensitivity, specificity, and inter-rater reliability.
Achieved 100% sensitivity and 100% specificity, outperforming expert clinicians
Expert performance varied with ultrasound mode, significantly lower in M-mode imaging (p < 0.001)
The AI maintained perfect sensitivity and reduced false positives in all conditions, including challenging scenarios

Abstract

Abstract Background Lung ultrasound is essential for rapid, radiation-free bedside pneumothorax diagnosis but limited by variability in human interpretation. Key gaps include insufficiently large and diverse human datasets, inconsistent image acquisition, lack of rigorous expert benchmarking, and inadequate clinical interpretability of existing artificial intelligence models. We aimed to develop and validate a robust, explainable artificial intelligence (AI) ensemble model addressing these critical gaps. Methods With our multidisciplinary team, we developed an explainable soft-voting ensemble model trained on 1,856 diverse ultrasound clips from critically ill patients, healthy volunteers, and tailored cadaver models. Model interpretability was ensured using visualization, with heatmaps validated by expert clinicians. The model’s diagnostic performance was rigorously benchmarked against 11 experienced clinicians using an independent, balanced test set. Statistical analyses included sensitivity, specificity and inter-rater reliability. Results The ensemble model achieved 100% sensitivity (95% CI: 85·8%-100·0%) and 100% specificity (95% CI: 85·8%-100·0%), surpassing expert sensitivity and specificity. Diagnostic performance of experts significantly differed by ultrasound mode, with notably lower specificity in M-mode imaging ( p < 0·001). The AI consistently maintained perfect sensitivity and significantly reduced false positives compared to clinicians across all conditions, including challenging diagnostic scenarios (e.g., subtle pleural motions), and showed excellent generalizability to both cadaveric and clinical cases. Conclusions Our explainable AI ensemble robustly matches the consensus-level performance of an expert "committee," significantly reducing diagnostic variability and false-positive diagnoses. This AI tool can serve as a critical second reader, standardize clinical decisions at the bedside, and substantially improve patient safety.

Bookmark

View Full Paper

Bookmark

View Full Paper

Explainable transfer learning ensemble AI model for lung ultrasound pneumothorax detection with expert benchmark

Key Points

Abstract

Cite This Study