Key points are not available for this paper at this time.
Construction workplaces often face unforeseen struck-by equipment hazards, leading to severe injuries and fatalities for workers. Detecting and localizing equipment sounds using multi-channel audio data has drawn interest in research. However, collecting such data for developing sound detection and localization machine learning models is challenging. Physical recordings on site required for deep learning are often infeasible due to the lack of proper sound attribute labels from heterogeneous construction sounds. This paper introduces a novel method for synthesizing overlapping and non-overlapping sound datasets in a three-dimensional space, utilizing Pyroomacoustics. The approach uses single sound data with attributes like start time, end time, azimuth, and elevation as microphone input to generate multi-channel audio output. The study successfully simulates 5,025 distinct scenario audios for both datasets, utilizing seven single-sound audiotapes. The generated large dataset can train neural network models capable of localizing equipment collision hazards in construction sites.
Elelu et al. (Mon,) studied this question.