This paper explores the application of the Fourier Transform for automatic musical instrument recognition in audio recordings. With the increasingcomplexity of musical compositions and the need for efficient audio classification, the study focuses on extracting detailed spectral features fromsound signals using the Fast Fourier Transform (FFT). These features include spectral centroid, bandwidth, roll-off, zero-crossing rate, and Mel-Frequency Cepstral Coefficients (MFCCs), which represent the frequency-based characteristics of different instruments. The extracted features are processed and used to train machine learning models. Specifically, the paper evaluates the performance of two classification algorithms: Approximate Nearest Neighbor (ANN) and Support Vector Machine (SVM). The models are trained on a dataset of short mono-instrument recordings and tested on mixed-instrument samples to assess generalization capabilities. The experimental results demonstrate that both models can effectively classify instruments with high accuracy – over 96 % in controlled environments. However, the accuracy decreases in complex polyphonic recordings due to overlapping frequencies. The study also highlights the role of libraries such as Librosa, Numpy, and Scikit-learn for preprocessing and model training. The findings suggest that while the proposed approach is not ideal for overlapping instruments in orchestras, it is highly effective in solo instrument classification and can be extended to tasks like genre recognition. Future research could include deep learning techniques and sound source separation to improve performance in polyphonic settings.
Kuz et al. (Mon,) studied this question.