Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient | Synapse