What question did this study set out to answer?

This research aims to improve speech enhancement by recovering clean audio from noisy speech while maintaining intelligibility and speaker identity.

May 15, 2026Open Access

Efficient Speech Enhancement via Flow Matching with Gated Bidirectional Mamba2

Key Points

This research aims to improve speech enhancement by recovering clean audio from noisy speech while maintaining intelligibility and speaker identity.
Proposed a speech enhancement framework using flow matching with a gated bidirectional Mamba2 model.
Introduced a DiMamba block for capturing past and future context with adaptive gating.
Conducted experiments on the DNS Challenge test set and VoiceBank test data.
Achieved a real-time factor of 0.31, over five times faster than diffusion models.
Reached a word error rate of 4.7% and a mean opinion score of 3.58 for quality assessment.
Demonstrated strong perceptual quality and effective speaker preservation.

Abstract

Speech enhancement (SE) aims to recover clean speech from noisy speech while preserving intelligibility, speaker identity, and runtime efficiency. Existing language-model (LM)-based methods may lose fine acoustic details due to discretization, whereas diffusion models often require many iterative denoising steps. This study proposes an efficient speech enhancement framework based on flow matching and a gated bidirectional Mamba2 backbone. The model predicts a continuous velocity field in the Mel-spectrogram domain and introduces a DiMamba block that captures past and future context through shared-weight bidirectional state-space modeling with adaptive gating. Experiments on the DNS Challenge test set and additional VoiceBank test data show that the proposed method achieves strong perceptual quality and speaker preservation while substantially reducing inference cost. The model reaches a real-time factor of 0.31, more than five times faster than diffusion baselines, and achieves a word error rate of 4.7% and a quality mean opinion score of 3.58. These results indicate that flow matching combined with gated bidirectional Mamba2 provides an effective quality–efficiency trade-off for offline speech enhancement.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Yuan et al. (Mon,) studied this question.

synapsesocial.com/papers/6a06b971e7dec685947ac24f https://doi.org/https://doi.org/10.3390/app16104757

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper