March 18, 2024Open Access

GPT-4 Driven Cinematic Music Generation Through Text Processing

Key Points

Key points are not available for this paper at this time.

Abstract

This paper presents Herrmann-1 1 , a multimodal framework to generate background music tailored to movie scenes, by integrating state-of-the-art vision, language, music, and speech processing models. Our pipeline begins by extracting visual and speech information from a movie scene, performing emotional analysis on it, and converting these into descriptive texts. Then, GPT-4 translates these high-level descriptions into low-level music conditions. Finally, these text-based music conditions guide a text-to-music model to generate music that resonates with input movie scenes. Comprehensive objective and subjective evaluations attest to the high synthesis quality, congruence, and superiority of our pipeline.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Haseeb et al. (Mon,) studied this question.

synapsesocial.com/papers/68e73894b6db6435876b216a https://doi.org/https://doi.org/10.1109/icassp48485.2024.10447950

Also Consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

Perguntar à IA

Bookmark

View Full Paper