To enhance the speech clarity in earable voice interaction scenarios, dual-microphone speech enhancement (SE) techniques with collaboration of in-ear and out-ear microphones have garnered significant attention from the research community. Nevertheless, existing dual-microphone SE techniques are established on a strong assumption: high-quality in-ear speech (auxiliary modality) could provide efficient complementary information to target airborne speech (primary modality) , which decreases the adaptation in the real world. In our work, we explore a key observation that air pressure imbalance caused by ear canal deformation (ECD) adversely affects the quality of in-ear speech, subsequently leading to a significant degradation in speech enhancement performance. To address this bottleneck issue, we design an efficient quality-aware speech enhancement solution, named QuaSE, which efficiently and dynamically fuses complementary information by assessing the quality variations of in-ear speech. Additionally, based on the analysis of spectral distortion induced by ECD, a training strategy including quality-aware data selection and content-aware augmentation is designed to improve the generalization capability of QuaSE. Extensive experiments demonstrate that QuaSE outperforms state-of-the-art techniques by 6.27%, 4.54%, 14.90%, and 11.93% in terms of PESQ, STOI, SI-SDR, and SegSNR. Moreover, we also validate that the proposed quality-aware fusion strategy can be modularly integrated into other sensing tasks, improving the fusion performance.
Han et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: