This study presents a real-time, on-device bird sound recognition system developed using deep transfer learning and optimized for mobile deployment. A curated Xeno-canto corpus, an open-access repository of wildlife sound recordings contributed by citizen scientists worldwide, comprising 610 Taiwanese bird species was used to evaluate six deep learning architectures: Residual Network-18 (ResNet-18), Yet Another Mobile Network (YAMNet), Visual Geometry Group-like Network for Audio Classification (VGGish), Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM), Attention-based Convolutional Neural Network (Attention-CNN), and a Deep Neural Network (DNN) baseline. All models were trained using class weighting, batch normalization, a dropout rate of 0. 2, and targeted data augmentation, including pitch shifting (±2 semitones), time stretching (0. 8–1. 2), and time shifting (16, 000 samples). Among these, ResNet-18 achieved the best balance between accuracy and computational efficiency, with an overall accuracy of 0. 955, macro-precision of 0. 95, macro-recall of 0. 94, and macro-F1 of 0. 945 across all 610 classes. The model performs inference in 25. 9 milliseconds with only 3. 03 megabytes of memory (approximately 795, 000 parameters), outperforming heavier architectures such as VGGish (0. 8975 accuracy, 42. 2 milliseconds, 587 megabytes) while remaining competitive with compact alternatives like YAMNet (0. 935 accuracy, 27. 0 milliseconds, 10. 19 megabytes). Furthermore, Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations confirm that predictions are driven by species-specific temporal–spectral patterns rather than background noise. Converting the optimized model to TensorFlow Lite enables fully offline inference on Android devices, eliminating cloud latency and ensuring user privacy. Overall, this lightweight, high-accuracy framework offers a scalable and practical solution for real-time biodiversity monitoring and conservation research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hailemariam Abebe Endalamaw
C. C. Yang
Cheng-Hung Hsu
Multimedia Tools and Applications
National Taiwan University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Endalamaw et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a75ddbc6e9836116a28216 — DOI: https://doi.org/10.1007/s11042-026-21211-y
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: