Multimodal Feature Learning for Video Captioning | Synapse