Using Denoising Diffusion Model for Predicting Global Style Tokens in an Expressive Text-to-Speech System | Synapse