Personalized marketing content has become a key driver for customer engagement in modern digital platforms. However, generating customized video content at scale remains a significant challenge due to the need for dynamic adaptation and automation. The proposed pipeline leverages Whisper-based speech transcription and Coqui TTS voice cloning to perform CSV driven keyword replacement, enabling the automatic generation and delivery of one personalized video per client entry. This paper proposes an automated and scalable system for generating personalized marketing videos based on structured client profiles and predefined multimedia templates. The approach integrates client data preprocessing, dynamic content selection, and automated video composition within a unified framework. Experimental validation confirms the feasibility and robustness of the proposed system, demonstrating its capability to efficiently generate customized marketing videos while ensuring scalability and flexibility for real-world deployment.
Ferras et al. (Mon,) studied this question.