What question did this study set out to answer?

This research aims to develop a framework for accurately detecting multiple avian vocalizations from audio field recordings using inductive learning techniques.

April 25, 2026

Spectrogram-derived graphs and inductive learning for multi-label avian vocalization detection in field recordings

Puntos clave

This research aims to develop a framework for accurately detecting multiple avian vocalizations from audio field recordings using inductive learning techniques.
Employs inductive spatial geometric deep learning networks for classification.
Constructs graphs from Mel-spectrograms using a trained deep convolutional neural network.
Evaluates performance using the Xeno-canto bird sound database and compares results with existing methods.
Achieved macro F1-scores of 0.90 with GraphSAGE and 0.92 with GAT.
Utilized AudioProtoPNet-20 with GAT, resulting in a macro F1-score of 0.93.
Proposed method outperformed state-of-the-art approaches, indicating higher accuracy in multi-label vocalization detection.

Resumen

This paper presents a methodology that employs inductive spatial geometric deep learning networks to detect multiple avian vocalizations from field recordings. Initially, a graph is constructed from the Mel-spectrogram of each audio file using a trained deep convolutional neural network (Deep CNN). The extracted features are used to build a node-feature graph, which is then processed by two spatial inductive graph-based models: graph sample and aggregation (GraphSAGE) and the graph attention network (GAT), for multi-label classification. To enhance the robustness and generalization of the Deep CNN, SpecAugment is applied to generate additional Mel-spectrograms via data augmentation. The proposed framework is evaluated on the Xeno-canto bird sound database and compared against state-of-the-art methods. The results demonstrate that the proposed inductive spatial graph-based approach outperforms existing techniques, achieving macro F1-scores of 0.90 with GraphSAGE and 0.92 with GAT. We further replaced Deep CNN with AudioProtoPNet-20 and evaluated GAT on the Xeno-canto dataset, obtaining a macro F1-score of 0.93.

Me gusta

Guardar

Me gusta

Guardar

Spectrogram-derived graphs and inductive learning for multi-label avian vocalization detection in field recordings

Puntos clave

Resumen

Cite This Study