Towards Human-Centered and Efficient Video Synthesis: A Survey of Multimodal Diffusion Models | Synapse