March 18, 2024Open Access

GPT-4 Driven Cinematic Music Generation Through Text Processing

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This paper presents Herrmann-1 1 , a multimodal framework to generate background music tailored to movie scenes, by integrating state-of-the-art vision, language, music, and speech processing models. Our pipeline begins by extracting visual and speech information from a movie scene, performing emotional analysis on it, and converting these into descriptive texts. Then, GPT-4 translates these high-level descriptions into low-level music conditions. Finally, these text-based music conditions guide a text-to-music model to generate music that resonates with input movie scenes. Comprehensive objective and subjective evaluations attest to the high synthesis quality, congruence, and superiority of our pipeline.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Haseeb et al. (Mon,) studied this question.

synapsesocial.com/papers/68e73894b6db6435876b216a https://doi.org/https://doi.org/10.1109/icassp48485.2024.10447950

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo