Key points are not available for this paper at this time.
Medical image analysis is central to clinical decision-making, and recent advances in vision–language models (VLMs) have introduced promising capabilities for jointly processing visual and textual data. This study evaluates zero-shot VLMs against convolutional neural networks (CNNs) and classical machine learning (CML) models for polyp detection (CADe) and classification (CADx) using 2,258 colonoscopy images from 428 patients with histopathological labels. We benchmarked 15 approaches including ResNet50, five CMLs (random forest, support vector machine, logistic regression, decision tree, Gaussian naive Bayes), two contrastive vision–language encoders (CLIP, BiomedCLIP), and seven frontier VLMs (GPT-4, GPT-4.1, GPT-4.1-mini, Gemma-3-27b, Qwen-2.5-vl-72b, Gemini-1.5-Pro, Claude-3-Opus). For polyp detection, the highest-performing VLMs (GPT-4.1 F1: 91.98%, GPT-4.1-mini F1: 91.16%) matched CNN performance (ResNet50 F1: 91.35%), though substantial variability existed across VLMs (F1 range: 19.37% to 91.98%). For classification, CNNs substantially outperformed VLMs: ResNet50 achieved weighted F1 of 74.94% versus 55.07% for GPT-4.1-mini, with performance gaps widening dramatically for rare polyp subtypes where VLMs often achieved 0% F1. External validation on 75 images showed that while ResNet50 performance declined substantially, some VLMs demonstrated more stable cross-institutional performance. These findings establish a task-dependent performance hierarchy where VLMs match CNNs for detection but remain limited for classification, suggesting distinct clinical roles for each approach.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mohammad Khalafi
Johns Hopkins University
Seyed Amir Ahmad Safavi‐Naini
Icahn School of Medicine at Mount Sinai
Ameneh Salehi
Shahid Beheshti University of Medical Sciences
SHILAP Revista de lepidopterología
Scientific Reports
Columbia University
Icahn School of Medicine at Mount Sinai
Cedars-Sinai Medical Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Khalafi et al. (Thu,) studied this question.
synapsesocial.com/papers/69db22fb78a3e0e288684e77 — DOI: https://doi.org/10.1038/s41598-025-29566-2