What question did this study set out to answer?

The study aims to develop and evaluate automated systems for classifying humpback whale calls using different neural network architectures and feature representations.

January 23, 2026Open Access

Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations

Puntos clave

The study aims to develop and evaluate automated systems for classifying humpback whale calls using different neural network architectures and feature representations.
Used passive acoustic monitoring to collect whale call audio data.
Applied data augmentation techniques to diversify the dataset.
Designed and trained multiple neural networks using TensorFlow and Keras.
Evaluated model performance based on accuracy and robustness to features like mel spectrograms and MFCC.
The MobileNetV2 model achieved 99.01% accuracy with mel spectrograms.
Custom CNN model reached 98.92% accuracy with a low false negative rate of 0.75%.
MFCC-based models demonstrated lower accuracy and higher false negative rates compared to mel spectrograms.

Resumen

Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo