Key points are not available for this paper at this time.
Multimodal contrastive learning aims to train a general-purpose feature extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text data. This can greatly benefit various complex downstream tasks, including cross-modal image-text retrieval and image classification. Despite its promising prospect, the security issue of cross-modal pre-trained encoder has not been fully explored yet, especially when the pre-trained encoder is publicly available for commercial use.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhou et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69de77dbbf539e2270558a6c — DOI: https://doi.org/10.1145/3581783.3612454
Ziqi Zhou
Shengshan Hu
Minghui Li
Huazhong University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...