What question did this study set out to answer?

This research aims to improve sound source isolation in acoustic scenes using flexible specifications for the sources of interest.

May 14, 2026

Flexible source separation for decomposing acoustic scenes and events

Key Points

This research aims to improve sound source isolation in acoustic scenes using flexible specifications for the sources of interest.
Discussed advantages and disadvantages of using natural language queries for audio source extraction.
Presented Task-aware Unified Source Separation (TUSS), a prompt-based separation model.
TUSS successfully decomposes sound scenes into flexible output sources.
Demonstrated potential for isolating multiple sources of the same type effectively.

Abstract

In addition to detection, classification, and localization of sound events, the ability to isolate the sound of one or multiple sound sources in an acoustic scene is important for downstream applications such as assisted listening, and virtual/augmented reality. Because of the wide variety of possible sound sources we may want to isolate, the ability to flexibly specify the source(s) of interest (e.g., via natural language queries) is necessary for most practical applications. In this talk I will first discuss the advantages and disadvantages of natural language queries for audio source extraction. I will then discuss Task-aware Unified Source Separation (TUSS), a recent prompt-based separation model that can decompose a sound scene into a flexible number of output sources including multiple sources of the same type.

Mark Helpful

Bookmark

Relay