What question did this study set out to answer?

The research aims to develop an interpretable framework for chest X-ray analysis that integrates clinical knowledge and improves model transparency.

March 17, 2026Open Access

Med-ViX-Ray: Enhancing explainable chest X-ray analysis with clinical knowledge graphs

Key Points

The research aims to develop an interpretable framework for chest X-ray analysis that integrates clinical knowledge and improves model transparency.
Introduced Med-ViX-Ray framework combining vision Transformers and clinical knowledge graphs.
Implemented a graph-informed attention bias and a region-aware nudging module to enhance predictions.
Evaluated the model on MIMIC-CXR for training and VinDR-CXR and RSNA Pneumonia for generalization.
Achieved improved F1-score (0.561 vs. 0.456) and recall (0.715 vs. 0.466) compared to a strong baseline.
Maintained competitive overall performance and enhanced interpretability with grounded clinical relevance.
Qualitative analyses confirm that the model effectively highlights significant regions and signs relevant to radiology.

Abstract

Deep learning has achieved remarkable success in chest x-ray interpretation, yet most models remain black boxes, producing accurate predictions without exposing the clinical reasoning behind them. This opacity limits trust and adoption in real-world practice. We introduce Med-ViX-Ray, a knowledge-guided and interpretable framework that integrates symbolic clinical reasoning into a vision Transformer backbone. The model leverages a structured graph of radiological signs and conditions, aligning image attention maps with domain knowledge through a probabilistic soft-matching module and a nudging mechanism that refines classifier outputs. This dual integration allows predictions to be explained in terms of clinically meaningful signs and corresponding image regions, offering transparency beyond post-hoc heatmaps. We evaluated Med-ViX-Ray on MIMIC-CXR for training and internal validation, and tested its generalization on VinDR-CXR and RSNA Pneumonia benchmarks. The proposed method improves recall and F1-score compared to a strong SwinV2 baseline (Respectively, F1-micro: 0.561 - 0.456; Precision: 0.462 - 0-529; Recall: 0.715 - 0.466; ROC: 0.788 - 0.744), while maintaining competitive overall performance. Qualitative analyses confirm that the model highlights clinically relevant regions and sign-activations aligned with radiological practice. These results suggest that knowledge-guided attention and sign-based explanations can enhance interpretability and recall in chest X-ray classification models. Future work will extend the framework toward report generation and prospective clinical evaluation. • We propose a novel knowledge-guided architecture combining vision Transformers with a clinical graph for chest X-ray interpretation. • We introduced a dual integration strategy: a graph-informed attention bias and a region-aware nudging module, both grounded in radiological semantics. • We developed an interpretable pipeline based on attention map reconstruction, region analysis, and feature-to-node matching with weak supervision. • We show improved classification performance, interpretability, and zero-shot generalization on two large-scale datasets.

Med-ViX-Ray: Enhancing explainable chest X-ray analysis with clinical knowledge graphs

Key Points

Abstract

Cite This Study

Also Consider

Also Consider