Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers | Synapse