Music generation has been an active research topic in AI for music, drawing considerable attention from both the academic community and industry. Recent advances in deep learning, coupled with the growing availability of diverse music data, have driven the development of innovative approaches in this field. Therefore, this paper aims to provide an overall survey of music generation, spanning from statistical models and neural network architectures to the pre-training and fine-tuning of large language models (LLMs). This survey adopts a broad view of music generation covering symbolic, audio, and multimodal settings, as well as autonomous, interactive, and co-creative generation. It further examines multiple aspects in depth, including key challenges, datasets, generative frameworks, and objective evaluation metrics. Most importantly, we synthesize recent progress in music generation and highlight promising directions for future research. By consolidating these perspectives, this survey highlights unique features that align with current trends in music generation and is expected to provide a valuable reference for advancing research in the field.
Duan et al. (Tue,) studied this question.