Ensemble-based multilingual video captioning with multimodel fusion of visual and audio cues | Synapse