The growing volume of medical data presents significant opportunities for advancing Medical Visual Question Answering (MVQA) systems. However, an imbalance in the number and distribution of image and Question–Answer (QA) pairs poses challenges for developing robust models. This study proposes improving existing MVQA datasets using data augmentation techniques specifically Mixup and Label Smoothing—to address this issue. The performance of MVQA models trained on these enhanced datasets is evaluated using quantitative metrics, as well as Layer-wise Relevance Propagation for eXplainable artificial intelligence (LRP XAI). Results indicate that models trained on the augmented datasets outperform those trained on the baseline datasets, showing significant gains in both accuracy and Bilingual Evaluation Understudy (BLEU) score. Furthermore, LRP XAI visualizations highlight key image and text regions that contribute to accurate answer predictions, thereby improving model interpretability and trust. This work underscores the importance of dataset augmentation and explainability in advancing MVQA research and it is available in https://doi.org/10.5281/zenodo.15910714 .
Mohamed et al. (Mon,) studied this question.