A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene | Synapse