In this work, we propose an embedded low-processing Machine Learning solution designed to assist in environmental acoustic monitoring. The pre-processing stage employs the Wavelet Packet Transform, generating low-dimensional features that serve as inputs to a Gradient Boosting model for the near-real-time classification of relevant sound events. Subsequently, we introduce an event filter that checks if there is any relevant event occurring at the moment before sending the features to the model or ignores them until any sound event is detected. This approach enhances the robustness of our solution, making it resilient to noise and wind-contaminated samples while optimizing memory, battery, and computational power usage. Finally, we converted the processing pipeline and trained model to the C programming language, successfully embedding them into the Nordic Thingy:53, a low-power hardware device equipped with a built-in digital Pulse Density Modulation microphone (VM3011 from Vesper). To evaluate the efficacy of our proposed method, we compared it with a convolutional neural network approach using Mel-frequency cepstral coefficients and conducted tests using audio recordings of bird species found in forests located in the central and western regions of Brazil, as well as samples of human activity-related sounds. The favorable classification scores obtained, in conjunction with the embedded solution's substantial battery life capacity, have the potential to greatly reduce the necessity for extensive environmental monitoring field surveys.
Junqueira et al. (Tue,) studied this question.