Towards Multi-modal Transformers in Federated Learning | Synapse