Deep learning has achieved remarkable success in chest x-ray interpretation, yet most models remain black boxes, producing accurate predictions without exposing the clinical reasoning behind them. This opacity limits trust and adoption in real-world practice. We introduce Med-ViX-Ray, a knowledge-guided and interpretable framework that integrates symbolic clinical reasoning into a vision Transformer backbone. The model leverages a structured graph of radiological signs and conditions, aligning image attention maps with domain knowledge through a probabilistic soft-matching module and a nudging mechanism that refines classifier outputs. This dual integration allows predictions to be explained in terms of clinically meaningful signs and corresponding image regions, offering transparency beyond post-hoc heatmaps. We evaluated Med-ViX-Ray on MIMIC-CXR for training and internal validation, and tested its generalization on VinDR-CXR and RSNA Pneumonia benchmarks. The proposed method improves recall and F1-score compared to a strong SwinV2 baseline (Respectively, F1-micro: 0.561 - 0.456; Precision: 0.462 - 0-529; Recall: 0.715 - 0.466; ROC: 0.788 - 0.744), while maintaining competitive overall performance. Qualitative analyses confirm that the model highlights clinically relevant regions and sign-activations aligned with radiological practice. These results suggest that knowledge-guided attention and sign-based explanations can enhance interpretability and recall in chest X-ray classification models. Future work will extend the framework toward report generation and prospective clinical evaluation. • We propose a novel knowledge-guided architecture combining vision Transformers with a clinical graph for chest X-ray interpretation. • We introduced a dual integration strategy: a graph-informed attention bias and a region-aware nudging module, both grounded in radiological semantics. • We developed an interpretable pipeline based on attention map reconstruction, region analysis, and feature-to-node matching with weak supervision. • We show improved classification performance, interpretability, and zero-shot generalization on two large-scale datasets.
Cieri et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: