Federated multimodal learning enables decentralized devices with diverse modalities to collaboratively train multimodal models without sharing their raw data. In most existing federated multimodal learning approaches, multimodal data is indispensable. However, in reality, a considerable number of devices can only collect unimodal data, and the data labels are often incomplete. Therefore, this paper proposes FedCMD, a federated multimodal learning approach that enables unimodal devices with missing labels to collaboratively train a multimodal model. To effectively leverage the label-missing samples, FedCMD performs unimodal federated learning first to learn unimodal encoders and make pseudo-labels for the unlabeled samples. Then it calculates and shares the prototypes of various modalities among devices for cross-modal feature alignment. The prototypes are finally served as the complementary of missing modalities to learn a multimodal fusion and classification network. With FedCMD, the learned multimodal network can support both unimodal inputs and multimodal inputs with any missing modalities. Extensive experiments demonstrate the efficacy of FedCMD compared to state-of-the-art baselines.
Deng et al. (Sat,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: