Background/Objectives: Vision language models (VLMs) show strong potential for medical image understanding, but their large scale often limits practical deployment. This study investigates whether a compact VLM can be effectively adapted for dermatology, with a focus on explaining bacterial skin disease images. Methods: We curate a dataset derived from PMC-OA using the BIOMEDICA dataset and construct PMC-derma-VQA-bacteria by pairing images with inherited figure captions and synthetically generated question–answer (QA) supervision produced by Google’s Gemini model. SmolVLM is fine-tuned under three supervision settings: QA-only, caption-only, and a combined QA+caption strategy. The models are evaluated on a held-out test set for both text-generation quality and diagnostic classification performance. Results: QA-only supervision yields the best report-generation performance, while the combined QA+caption setting achieves the highest classification accuracy (70.20%). Conclusions: Synthetic QA supervision can meaningfully enhance compact VLMs for medical image understanding and diagnostic support in dermatology.
Building similarity graph...
Analyzing shared references across papers
Loading...
S. Zhang
Abdurrahim Yilmaz
Gülsüm Gençoğlan
Diagnostics
Imperial College London
Turkish Society of Cardiology
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6997fa90ad1d9b11b3453dbf — DOI: https://doi.org/10.3390/diagnostics16040603