The purpose of this paper is to explore the reform and innovation mode of music teaching driven by artificial intelligence (AI). By introducing the Dilated Convolutional Neural Network (DCNN) algorithm and further integrating the attention mechanism, an audio recognition model based on a Multi-Branch Fusion Network Based on Dilated Convolution and Attention Mechanism (MBFN-DCAM) is constructed to improve the accuracy and efficiency of music audio recognition. Horizontal comparison experiments show that MBFN-DCAM outperforms 6 representative state-of-the-art (SOTA) models, including Audio Spectrogram Transformer (AST) and ResNet-50, with a recognition accuracy of 95.65% ± 0.35% (p < 0.01). Validated by a randomized controlled trial (n = 60), feedback provided by the model significantly improves students’ pitch accuracy and enhances their Music Self-Efficacy Scale (MES). Furthermore, the inference latency of the model on Jetson Nano is only 82.4 ms, fully demonstrating its deployment advantages in resource-constrained environments. This paper provides an efficient, robust, and educationally effective technical approach for intelligent music teaching evaluation.
Liu et al. (Thu,) studied this question.