Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. One possible application of S5 is extended reality (XR) services that capture a user’s surrounding acoustic scene and transmit it to remote participants. By decomposing the mixture into individual sound objects paired with class labels and 6 degree of freedom (6DoF) metadata, the rendering engine can update direction and distance as the listener moves in real time. S5 can also support assisted living through room sound monitoring. S5 task was adopted as DCASE2025 Task4, and its setting within the Detection and Classification of Acoustic Scenes and Events Challenge and the newly recorded DCASE2025 Task4 Dataset are outlined. In this presentation we relate S5 to past DCASE tasks, describe the new dataset, and discuss current challenges and future directions for S5.
Yasuda et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: