AI-driven audio-to-video generation for dynamic content creation via stable diffusion and CNN-augmented transformers | Synapse