Deep learning methods are increasingly applied to categorize animal vocalizations and have the potential to contribute to the complex task of determining structured call repertoires. In this study, we used the DeepAcoustics tool to evaluate two convolutional neural network architectures—TinyYOLO (lightweight) and DarkNet (heavyweight)—for multiclass detection of predefined baleen whale call types in long-term acoustic recordings from Antarctica and the North Pacific. Both networks were pre-trained on the COCO imagery dataset and adapted for spectrogram-based classification. We assessed how effectively each network identifies specific call types within a constrained repertoire and between species, including blue whale A, B, D, and Z calls, as well as 20- and 40-Hz fin whale calls. We also examined the potential of using bounding-box detections to group calls into broader acoustic categories—such as downsweeps or tonal units—as a means of coarse repertoire assessment. Our findings highlight strengths and limitations in applying object detection to marine bioacoustics and emphasize the importance of network architecture selection and customization in supporting accurate call-type categorization and comparative acoustic analysis across and within species.
Alongi et al. (Wed,) studied this question.