What type of study is this?

This is a Quantitative Study study.

October 11, 2025Open Access

Large language model for interpreting the Paris classification of colorectal polyps

Key Points

M-LLM classified non-polypoid from polypoid lesions with 73% accuracy, similar to experts and non-experts.
Accuracy for sessile vs. pedunculated lesions was only 55%, significantly lower than that of experts and non-experts.
Comparative analysis involved 100 unique colorectal polyps from the SUN dataset, each labeled with Paris classification.
M-LLM performed comparably to endoscopists for certain morphological distinctions but struggled with pedunculated lesions.

Abstract

Abstract Reporting of colorectal polyp morphology using the Paris classification is often inaccurate. Multimodal large language models (M-LLMs) may support morphological assessment. This study aimed to evaluate the accuracy of an M-LLM (GPT-4o) in classifying colorectal polyp morphology compared with expert and non-expert endoscopists. We used the SUN dataset of colonoscopy videos from 100 unique colorectal polyps, each labeled with the validated Paris classification. An M-LLM (GPT-4o) classified five representative frames per lesion. Three expert and three non-expert endoscopists, blinded to one another, performed the same task. The primary outcome was accuracy in differentiating non-polypoid (IIa/IIc) from polypoid (Is/Ip/Isp) lesions. The secondary outcome was accuracy in differentiating sessile (Is) from pedunculated (Ip/Isp) lesions. Given the exploratory design, no multiplicity correction was applied; point estimates are presented with 95% confidence intervals (CIs), and P values are interpreted descriptively. M-LLM accuracy for differentiating non-polypoid from polypoid lesions was 73% (95% CI 63%-81%), comparable to experts (75%, 65%-83%; P = 0.84) and non-experts (77%, 68%-85%; P = 0.52), with similar sensitivity and specificity. Accuracy for differentiating sessile from pedunculated lesions was 55% (95% CI 42%-67%), lower than experts (76%; P = 0.02) and non-experts (77%; P = 0.01), primarily due to poor specificity (12% vs. experts 82% and non-experts 88%; P < 0.01 for both comparisons). M-LLMs performed comparably to endoscopists in distinguishing non-polypoid from polypoid lesions but failed to reliably identify pedunculated morphology.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Massimi et al. (Thu,) studied this question.

synapsesocial.com/papers/68e9b1d0ba7d64b6fc132adb https://doi.org/https://doi.org/10.1055/a-2703-0209

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper