Medical Visual Question Answering (MedVQA) leverages computer vision and natural language processing techniques to assist in clinical decision-making. However, existing models frequently encounter challenges such as restricted multimodal interaction, insufficient guidance from external medical knowledge, and a lack of rigorous diagnostic logic in their responses. To address these issues, we propose KGLMQA, a novel framework that integrates knowledge graphs with Large Language Models (LLMs). The framework comprises three core components: a high-precision MedVQA classification model utilizing gating mechanisms and multi-stage feature fusion; a Knowledge Graph Retrieval Augmented Generation (KGRAG) module for dynamically retrieving and refining structured medical knowledge; and an LLM that generates professional responses based on structured prompts. Experimental results on the Patient-oriented Visual Question Answering (P-VQA), Visual Question Answering in Radiology (VQA-RAD), and Semantically-Labeled Knowledge-Enhanced (SLAKE) datasets demonstrate that KGLMQA achieves state-of-the-art performance in metrics such as Accuracy and Precision. Notably, it outperforms the fully fine-tuned Large Language-and-Vision Assistant for Biomedicine (LLaVA-Med) model in handling open-ended questions on the VQA-RAD dataset. Further error propagation analysis and comparative evaluation with Generative Pre-trained Transformer 4o (GPT-4o) reveal that KGLMQA not only exhibits strong robustness against upstream visual noise but also surpasses GPT-4o in terms of diagnostic logicality. These findings indicate that integrating visual diagnostic cues and explicit structured knowledge with LLMs significantly enhances the interpretability and clinical application potential of MedVQA systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wenhu Wang
Huina Liu
Changfa Wei
PeerJ Computer Science
Hunan University of Traditional Chinese Medicine
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69a91dedd6127c7a504c1487 — DOI: https://doi.org/10.7717/peerj-cs.3679
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: