What question did this study set out to answer?

This research aims to enhance music generation quality by optimizing the VAE algorithm with multi-source audio data.

June 4, 2026Open Access

Optimization of Computer Variational Autoencoder (VAE) Algorithm for Music Generation: Driven by Multi-Source Audio Data

Puntos clave

This research aims to enhance music generation quality by optimizing the VAE algorithm with multi-source audio data.
Constructed a multi-source audio dataset for training.
Developed preprocessing and alignment strategies for heterogeneous audio data.
Enhanced VAE model structure with attention mechanisms and hierarchical encoding.
Reconstruction error reduced by 23 dB compared to traditional VAE.
Compared to single source VAE, reconstruction error reduced by 10 dB.
Generated audio similarity increased by 18 percentage points over traditional VAE.

Resumen

With the entry of artificial intelligence into music creation, automatic music creation technology based on generative models is becoming a research center. Variable autoencoders (VAE) are widely used in music production because they can capture the potential distribution of data and generate different content. The existing music production methods largely rely on a single source of training data, which leads to problems such as monotonous sound quality, lack of structural coherence, and poor adaptability, resulting in many music styles. In addition, the heterogeneity and redundancy of multi-source audio data make data merging difficult, further limiting the quality of generated music. The structure, content, and implementation of this article propose an optimization scheme for the VAE algorithm controlled by audio data from multiple sources. Firstly, construct a multi-source audio dataset and develop heterogeneous data preprocessing and alignment strategies to effectively integrate audio data from different sources (instrument volume, sound, environmental sound effects) and formats; Secondly, the structure of the VAE model is improved by introducing attention mechanisms and hierarchical encoding modules to enhance time and time extraction. The multidimensional characteristics of music; Finally, optimize the lower bound objective function to alleviate the problem of rear collapse during model training. The experimental investigation results show that the optimized model in this article outperforms the comparative model in all three indicators. Compared with traditional VAE, the reconstruction error is reduced by 23 dB, and compared with single source driven VAE, it is reduced by 10 dB. This indicates that data from multiple sources can enable the model to more accurately capture audio features and reduce reconstruction distortion. The similarity of the generated audio is 18 percentage points higher than that of traditional UAE and 5 percentage points higher than that of single source driven VAE. This provides an effective technical approach for music creation tasks based on multiple sources.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo