What question did this study set out to answer?

The study aims to assess how well BiomedCLIP can classify pediatric esophageal X-rays into specific categories related to contrast and strictures.

February 5, 2026Open Access

Assessing the Diagnostic Accuracy of BiomedCLIP for Detecting Contrast Use and Esophageal Strictures in Pediatric Radiography †

Key Points

The study aims to assess how well BiomedCLIP can classify pediatric esophageal X-rays into specific categories related to contrast and strictures.
Retrospective analysis of 143 pediatric esophageal X-rays from 2021 to 2025.
Annotations for contrast presence, esophageal visibility, and stricture occurrence by two radiology experts.
Zero-shot classification setup for BiomedCLIP without fine-tuning.
Model predictions evaluated against ground truth using 27 performance metrics.
BiomedCLIP achieved 88.7% precision and an AUC of 85.4% for detecting contrast presence.
Low specificity (20%) resulted in a high false-positive rate.
The model successfully identified non-visible esophagus cases but could not predict full visibility.
Performance for detecting esophageal strictures was poor: 24% accuracy, 44% sensitivity, and 18% specificity.

Abstract

Background/Objectives: Vision–language models such as BiomedCLIP are increasingly investigated for their diagnostic potential in medical imaging. Although these foundation models show promise in general radiographic interpretation, their application in pediatric domains—particularly for subtle, postoperative findings like esophageal strictures—remains underexplored. This study aimed to evaluate the diagnostic performance of BiomedCLIP in classifying pediatric esophageal radiographs into three clinically relevant categories: presence of contrast agent, full esophageal visibility, and presence of esophageal stricture. Methods: We retrospectively analyzed 143 pediatric esophageal X-rays collected between 2021 and 2025. Each image was annotated by two pediatric radiology experts and categorized according to esophageal visibility, contrast presence, and stricture occurrence. BiomedCLIP was used in a zero-shot classification setup without fine-tuning. Model predictions were converted into binary outcomes and assessed against the ground truth using a comprehensive suite of 27 performance metrics, including accuracy, sensitivity, specificity, F1-score, AUC, and calibration analyses. Results: BiomedCLIP achieved high precision (88.7%) and a favorable AUC (85.4%) in detecting contrast agent presence, though specificity remained low (20%), leading to a high false-positive rate. The model correctly identified all cases of non-visible esophagus, but was untestable in predicting full visibility due to the absence of positive cases. Critically, its performance in detecting esophageal strictures was poor, with accuracy at 24%, sensitivity at 44%, specificity at 18%, and AUC of 0.26. Statistical overlap between contrast and stricture predictions indicated a lack of semantic differentiation within the model’s latent space. Conclusions: BiomedCLIP shows potential in detecting high-salience features such as contrast but fails to reliably identify esophageal strictures. Limitations include class imbalance, absence of fine-tuning, and architectural constraints in recognizing subtle morphologic abnormalities. These findings emphasize the need for domain-specific adaptation of foundation models before clinical implementation in pediatric radiology.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper