Video-to-Audio Generation with Hidden Alignment | Synapse