With the entry of artificial intelligence into music creation, automatic music creation technology based on generative models is becoming a research center. Variable autoencoders (VAE) are widely used in music production because they can capture the potential distribution of data and generate different content. The existing music production methods largely rely on a single source of training data, which leads to problems such as monotonous sound quality, lack of structural coherence, and poor adaptability, resulting in many music styles. In addition, the heterogeneity and redundancy of multi-source audio data make data merging difficult, further limiting the quality of generated music. The structure, content, and implementation of this article propose an optimization scheme for the VAE algorithm controlled by audio data from multiple sources. Firstly, construct a multi-source audio dataset and develop heterogeneous data preprocessing and alignment strategies to effectively integrate audio data from different sources (instrument volume, sound, environmental sound effects) and formats; Secondly, the structure of the VAE model is improved by introducing attention mechanisms and hierarchical encoding modules to enhance time and time extraction. The multidimensional characteristics of music; Finally, optimize the lower bound objective function to alleviate the problem of rear collapse during model training. The experimental investigation results show that the optimized model in this article outperforms the comparative model in all three indicators. Compared with traditional VAE, the reconstruction error is reduced by 23 dB, and compared with single source driven VAE, it is reduced by 10 dB. This indicates that data from multiple sources can enable the model to more accurately capture audio features and reduce reconstruction distortion. The similarity of the generated audio is 18 percentage points higher than that of traditional UAE and 5 percentage points higher than that of single source driven VAE. This provides an effective technical approach for music creation tasks based on multiple sources.
Chaonan Ding (Thu,) studied this question.