A comparative accuracy study of multimodal LLMs, VLM and agent-based framework for pulmonary nodule detection on chest radiographs

Key Points

Accuracy values of assessed models were not suitable for clinical use, warranting further refinement.
MedRAX framework and BiomedCLIP model achieved the highest accuracy but lacked statistical significance compared to others.
Assessment focused on a small dataset with a single-center design using PNG images instead of DICOM files.
Findings suggest improvements in open-source models may enhance their diagnostic capabilities.

Abstract

MedRAX framework and BiomedCLIP vision-language model showed the highest accuracy values. No statistically significant difference was observed between proprietary and open-source models, which may indicate potential for improving accuracy through refinement of open-source LLM-based models. Overall, accuracy values of evaluated models were insufficient for current clinical practice implementation. These results should be seen as exploratory given the small dataset size, single-centre design, different prompting strategies for foundation and domain-adapted models and use of PNG images instead of DICOM.

Bookmark

View Full Paper

Bookmark

View Full Paper

A comparative accuracy study of multimodal LLMs, VLM and agent-based framework for pulmonary nodule detection on chest radiographs

Key Points

Abstract

Cite This Study