Improving Audio Generation with Visual Enhanced Caption | Synapse