Vision-Language Models in the Era of Multimodal Foundation Models | Synapse