What question did this study set out to answer?

This research aims to enhance the training of medical vision-language models by using minimal labeled data while maintaining reasoning capabilities.

May 2, 2026

Key concept learning for medical vision language model with reasoning capabilities.

Key Points

This research aims to enhance the training of medical vision-language models by using minimal labeled data while maintaining reasoning capabilities.
Developed ConceptVLM using a key concept-aware training strategy.
Built a structured medical concept dictionary to guide focus during fine-tuning.
Employed masked attention to enhance comprehension of essential clinical concepts.
Achieved state-of-the-art results with only 1% of the original training data.
Outperformed traditional methods dependent on large-scale question-and-answer datasets.

Abstract

Training medical vision-language models (VLMs) typically demands millions of image-text pairs to achieve versatility and reasoning, posing significant challenges in data acquisition. We propose ConceptVLM, a novel data-efficient fine-tuning paradigm that transforms general-domain VLMs into specialized medical ones with minimal labeled data, integrating medical knowledge without disrupting the model's existing general capabilities. Central to our approach is a key concept-aware training strategy, building a structured medical concept dictionary and employing masked attention to guide the model's focus toward essential clinical concepts. This focused fine-tuning enhances domain-specific comprehension while preserving the model's reasoning abilities and response diversity. Experiments across multimodal medical benchmarks show ConceptVLM achieves state-of-the-art results using only 1% of the original training data, outperforming traditional methods reliant on large-scale QA datasets. These findings challenge the prevailing reliance on extensive annotated corpora, demonstrating key concept-guided tuning as a viable path to developing cognitively capable medical VLMs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Wei Lou

Yue Wu

Pusheng Xu

Actions

Institutions

École Polytechnique Fédérale de Lausanne

Hong Kong Polytechnic University

Zhejiang Normal University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Key concept learning for medical vision language model with reasoning capabilities.

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider