Key points are not available for this paper at this time.
Large music collections often contain several recordings of the same piece of music, which are interpreted by various musicians and possibly arranged in different instrumentations. Given a short query audio clip, an important task in audio retrieval is to automatically and efficiently identify all corresponding audio clips irrespective of the specific interpretation or instrumentation. In view of this problem, which is also referred to as audio matching, the main contribution of this paper is to introduce a new type of audio feature that strongly correlates to the harmonic progression of the audio signal. In addition, our feature shows a high degree of robustness to variations in parameters such as dynamics, timbre, articulation, and local tempo deviations. The feature design is carried out in two stages basically taking short-time statistics over chroma-based energy distributions. Here, the chroma correspond to the 12 traditional pitch classes of the equal-tempered scale. Applied to audio matching on a large audio database consisting of a wide range of classical music (112 hours of audio material), our features proved to be a powerful tool providing accurate matchings in an efficient way concerning time as well as memory requirements.
Müller et al. (Wed,) studied this question.