Hierarchical Text-Conditional Image Generation with CLIP Latents | Synapse