Multi-Task Video Captioning with a Stepwise Multimodal Encoder | Synapse