What does this research mean for the field?

The Optimal Transport Dual Prompt Personalization (OTDPP) framework significantly improves the adaptability and performance of federated CLIP fine-tuning compared to classic prompt tuning methods. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to enhance the fine-tuning of CLIP models in federated settings by addressing alignment and adaptability issues.

February 28, 2026Open Access

A Federated CLIP Fine-Tuning Method Based on Optimal Transport and Dual Prompt Personalization

Key Points

The aim is to enhance the fine-tuning of CLIP models in federated settings by addressing alignment and adaptability issues.
Developed the Optimal Transport Dual Prompt Personalization (OTDPP) framework.
Injected prompt parameters into visual and text encoders.
Achieved alignment through optimal transport.
Designed a dual prompt tuning mechanism with global and local parts.
Conducted extensive experiments comparing OTDPP against classic prompt tuning methods.
OTDPP reduces computational and communication overhead.
Maintains client-specific personalized features.
Significantly improves model adaptability and overall performance.
Shows broad application potential for CLIP in various professional fields.

Abstract

The Contrastive Language-Image Pre-training (CLIP) model uses contrastive learning to align image and text representations, and fine-tuning CLIP with federated learning can extend its application to professional fields. However, federated CLIP fine-tuning faces two key challenges: insufficient alignment of fine-grained semantics between vision and text modalities and poor adaptability to non-independent and identically distributed (non-IID) data. This paper proposes the Optimal Transport Dual Prompt Personalization (OTDPP) framework, injects prompt parameters into the deep networks of both visual and text encoders, achieves fine-grained cross-modal alignment through optimal transport, and designs a dual prompt tuning mechanism. The framework splits prompt parameters into a shared global part aggregated by the server and a private local part reserved by clients, and it enables personalized adaptation without updating large backbone encoders. Extensive experiments show that compared with classic prompt tuning baseline methods, OTDPP reduces computational and communication overhead, retains client-specific personalized features, significantly improves model adaptability and performance, and thus demonstrates broad application prospects.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper

Cite This Study

Shi et al. (Fri,) studied this question.

synapsesocial.com/papers/69a287f20a974eb0d3c03d38 https://doi.org/https://doi.org/10.3390/electronics15050972

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

KI fragen

Bookmark

View Full Paper