Training medical vision-language models (VLMs) typically demands millions of image-text pairs to achieve versatility and reasoning, posing significant challenges in data acquisition. We propose ConceptVLM, a novel data-efficient fine-tuning paradigm that transforms general-domain VLMs into specialized medical ones with minimal labeled data, integrating medical knowledge without disrupting the model's existing general capabilities. Central to our approach is a key concept-aware training strategy, building a structured medical concept dictionary and employing masked attention to guide the model's focus toward essential clinical concepts. This focused fine-tuning enhances domain-specific comprehension while preserving the model's reasoning abilities and response diversity. Experiments across multimodal medical benchmarks show ConceptVLM achieves state-of-the-art results using only 1% of the original training data, outperforming traditional methods reliant on large-scale QA datasets. These findings challenge the prevailing reliance on extensive annotated corpora, demonstrating key concept-guided tuning as a viable path to developing cognitively capable medical VLMs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wei Lou
Yue Wu
Pusheng Xu
École Polytechnique Fédérale de Lausanne
Hong Kong Polytechnic University
Zhejiang Normal University
Building similarity graph...
Analyzing shared references across papers
Loading...
Lou et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69f5947e71405d493afff41e — DOI: https://doi.org/10.1038/s41746-026-02676-5
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: