With the rapid expansion of digital music content, efficient and accurate methods for music feature extraction and style generation have become critical research areas in music information processing. Traditional manual labeling methods are insufficient to manage large-scale music data due to their inefficiency and subjectivity. Deep learning technologies, particularly Long Short-Term Memory (LSTM) networks, have shown significant promise in capturing temporal and stylistic characteristics of musical sequences. This paper introduces an integrated framework that combines spectral and cepstral analyses—applying short-time Fourier transform on overlapping 20 ms frames with 10 ms hops and extracting log-power cepstrum plus Mel-frequency cepstral coefficients (MFCCs)—to systematically derive pitch, resonance peaks, and timbre features. These handcrafted vectors are concatenated with learned embeddings and fed into a bidirectional LSTM, enabling the model to leverage both explicit frequency-domain cues and long-range temporal dependencies. By processing each sequence in forward and backward directions, the bidirectional LSTM outperforms unidirectional variants in genre classification accuracy and produces more coherent musical transitions. In tests on five genres (Jazz, Classical, Rock, Country, and Disco), our framework achieved an average classification accuracy of 81.2%, an F1-score of 0.79, and a Mean Opinion Score of 4.1/5 for stylistic coherence in a blind listening study. Additionally, quantitative evaluation of harmonic progression consistency (85% retention of original chord transitions) and dynamic contour reproduction (Pearson correlation of 0.82 with source note velocities) demonstrates the model’s ability to generalize to non-genre-specific musical elements. Experimental results confirm that the proposed method generates high-quality, stylistically faithful music across diverse genres, offering an automated, efficient solution for both music analysis and creative generation.
Zhongling Tong (Thu,) studied this question.