June 23, 2024Open Access

Multimodal Multilabel Classification by CLIP

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Multimodal multilabel classification (MMC) is a challenging task that aims to design a learning algorithm to handle two data sources, the image and text, and learn a comprehensive semantic feature presentation across the modalities. In this task, we review the extensive number of state-of-the-art approaches in MMC and leverage a novel technique that utilises the Contrastive Language-Image Pre-training (CLIP) as the feature extractor and fine-tune the model by exploring different classification heads, fusion methods and loss functions. Finally, our best result achieved more than 90% F₁ score in the public Kaggle competition leaderboard. This paper provides detailed descriptions of novel training methods and quantitative analysis through the experimental results.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Yanming Guo (Sun,) studied this question.

synapsesocial.com/papers/68e639f7b6db6435875cc2fa https://doi.org/https://doi.org/10.48550/arxiv.2406.16141

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo