Learning to enhance areal video captioning with visual question answering | Synapse