Policy Learning-Based Image Captioning With Vision Transformer | Synapse