What question did this study set out to answer?

To develop an efficient acoustic echo cancellation method using deep learning techniques that enhance speech quality in full-duplex systems.

March 18, 2026Open Access

Fully Convolutional Recurrent Network with Multiple Sub-Filters for Monophonic Acoustic Echo Cancellation

Key Points

To develop an efficient acoustic echo cancellation method using deep learning techniques that enhance speech quality in full-duplex systems.
Proposed a fully convolutional recurrent network combined with multiple sub-filters.
Evaluated performance on the ICASSP 2022 AEC challenge and DNS-CHiME3 datasets.
Utilized bidirectional long short-term memory units and channel-wise attention mechanisms for adaptive learning.
Achieved 38-40 dB echo return loss enhancement (ERLE) at 40 dB signal-to-noise ratio (SNR).
Found a 12-15 dB improvement over conventional NLMS methods.
Demonstrated a signal-to-distortion ratio of 38.9 dB and faster convergence by 30-40% compared to traditional algorithms.

Abstract

Acoustic echo cancellation (AEC) remains a critical challenge in full-duplex communication systems, where acoustic coupling between loudspeakers and microphones significantly degrades speech quality. Conventional adaptive filtering methods, such as the normalized least mean squares (NLMS) and recursive least squares algorithms, offer computational efficiency but suffer from limited convergence rates, sensitivity to nonlinear distortions, and poor adaptability under time-varying acoustic conditions. Deep neural network-based AEC approaches have demonstrated improved residual echo suppression, yet their high computational complexity constrains real-time applicability. This paper proposes a fully convolutional recurrent network-based multiple sub-filter (FCRN-MSF) framework that combines the efficiency of adaptive filtering with the dynamic modelling capability of deep learning, achieving 12-15 dB enhancement in echo return loss enhancement (ERLE) and reduced steady-state mean square error over state-of-the-art baselines. The proposed hybrid architecture employs MSFs to capture multi-path echo characteristics across diverse delay distributions, while an FCRN-based step-size estimator adaptively tunes the learning rate using temporal-spatial correlations derived from bidirectional long short-term memory units and channel-wise attention mechanisms. Extensive evaluations on the ICASSP 2022 AEC challenge and DNS-CHiME3 datasets demonstrate that the proposed method achieves 38-40 dB ERLE at 40 dB SNR (12-15 dB improvement over NLMS baselines), a perceptual evaluation of speech quality scores of 3.80 (0.10-0.15 point improvement), a signal-to-distortion ratio of 38.9 dB, and a 30-40% faster convergence time (1.10 s vs. 1.65 s) compared to traditional AEC algorithms that makes it suitable for real-time deployment in resource-constrained full-duplex communication systems.

Fully Convolutional Recurrent Network with Multiple Sub-Filters for Monophonic Acoustic Echo Cancellation

Key Points

Abstract

Cite This Study