Efficient Vision-Language Pre-training by Cluster Masking | Synapse