Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals | Synapse