This study proposes a composite model for opera audio recognition and style generation. The model integrates chaotic fingerprint coding, deep neural networks, and generative networks for style transfer. The model uses a 20-bit chaotic audio fingerprint based on logistic mapping and time-frequency peaks. This technology can achieve efficient compression and robust recognition. The accuracy of method in a noisy environment is 92.3%, which is 12.5% higher than that of traditional methods. The DNN-LightGBM cascade structure effectively models features and efficiently classifies features in 19 opera categories with an accuracy of 88-95%. In terms of style transfer, generative adversarial network with orthogonal style loss function separates timbre and style and reduce Mel cepstral distortion by 18.3%, from 5.24 to 4.87. In addition, spectrum-based unsupervised linear style encoder improves the robustness of the transfer by 23.6% under various accompaniment conditions. The framework has high recognition accuracy, high-quality style transfer, and strong adaptability.
Jie Pan (Thu,) studied this question.