Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Synapse