Caption-Aware Medical VQA via Semantic Focusing and Progressive Cross-Modality Comprehension | Synapse