Sound of Vision: Audio Generation from Visual Text Embedding through Training Domain Discriminator | Synapse