Adding Multimodal Capabilities to a Text-only Translation Model | Synapse