Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy | Synapse