VLRM: Vision-Language Models act as Reward Models for Image Captioning | Synapse