Multi-microphone noise reduction remains a central topic in audio signal processing. Traditionally, spatial filters—such as the Minimum Variance Distortionless Response (MVDR) and Multichannel Wiener Filter (MWF)—are designed to meet specific optimization objectives. Incorporating acoustic propagation models has been shown to enhance their performance significantly. This talk begins with a brief overview of beamforming for speech enhancement, emphasizing the role of acoustic modeling. We then introduce two recent deep neural network (DNN)-based approaches for multi-microphone processing. The first method, peerRTF, adheres to the MVDR criterion with the steering vector implemented with the Relative Transfer Function (RTF). By learning the structure of the RTF manifold using a graph convolutional network (GCN), the method enables robust inference of RTFs in a spatially constrained region, leading to enhanced beamformer performance. The second method, ExNet-BF + PF (Explainable DNN-based Beamformer with Postfilter), directly estimates beamformer weights using a DNN while preserving the interpretability and constraints of classical designs. It employs a two-stage architecture, consisting of a multichannel spatial filter with time-invariant weights, followed by a time-varying single-channel postfilter. We further analyze how spatial cues are exploited during inference, providing insights into the learned beam patterns. Audio demonstrations will accompany the presentation.
Gannot et al. (Wed,) studied this question.