Enhancing Audio–Visual Synchronization and Spatiotemporal Expressiveness for Talking Face Generation | Synapse