Double‐Attention Transformer for Cross‐Modal Image Captioning: Enhancing Visual–Linguistic Alignment on Low‐Resource Datasets | Synapse