What question did this study set out to answer?

The study aims to adapt a small vision language model for enhanced understanding of bacterial skin disease images.

February 20, 2026Open Access

Fine-Tuning a Small Vision Language Model Using Synthetic Data for Explaining Bacterial Skin Disease Images

Key Points

The study aims to adapt a small vision language model for enhanced understanding of bacterial skin disease images.
Curated a dataset from PMC-OA and BIOMEDICA for training.
Created PMC-derma-VQA-bacteria dataset with image-caption pairs and synthetic QA supervision.
Fine-tuned SmolVLM under three supervision settings: QA-only, caption-only, and combined QA+caption.
Evaluated models on a held-out test set for text generation and classification tasks.
QA-only supervision achieved the best report generation performance.
The combined QA+caption setting resulted in the highest classification accuracy of 70.20%.
Synthetic QA supervision significantly enhanced the performance of compact VLMs.

Abstract

Background/Objectives: Vision language models (VLMs) show strong potential for medical image understanding, but their large scale often limits practical deployment. This study investigates whether a compact VLM can be effectively adapted for dermatology, with a focus on explaining bacterial skin disease images. Methods: We curate a dataset derived from PMC-OA using the BIOMEDICA dataset and construct PMC-derma-VQA-bacteria by pairing images with inherited figure captions and synthetically generated question–answer (QA) supervision produced by Google’s Gemini model. SmolVLM is fine-tuned under three supervision settings: QA-only, caption-only, and a combined QA+caption strategy. The models are evaluated on a held-out test set for both text-generation quality and diagnostic classification performance. Results: QA-only supervision yields the best report-generation performance, while the combined QA+caption setting achieves the highest classification accuracy (70.20%). Conclusions: Synthetic QA supervision can meaningfully enhance compact VLMs for medical image understanding and diagnostic support in dermatology.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

S. Zhang

Abdurrahim Yilmaz

Gülsüm Gençoğlan

Journals

Diagnostics

Actions

Institutions

Imperial College London

Turkish Society of Cardiology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Fine-Tuning a Small Vision Language Model Using Synthetic Data for Explaining Bacterial Skin Disease Images

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study