August 22, 2024

EarSpeech: Exploring In-Ear Occlusion Effect on Earphones for Data-efficient Airborne Speech Enhancement

Key Points

Key points are not available for this paper at this time.

Abstract

Earphones have become a popular voice input and interaction device. However, airborne speech is susceptible to ambient noise, making it necessary to improve the quality and intelligibility of speech on earphones in noisy conditions. As the dual-microphone structure (i.e., outer and in-ear microphones) has been widely adopted in earphones (especially ANC earphones), we design EarSpeech which exploits in-ear acoustic sensory as the complementary modality to enable airborne speech enhancement. The key idea of EarSpeech is that in-ear speech is less sensitive to ambient noise and exhibits a correlation with airborne speech. However, due to the occlusion effect, in-ear speech has limited bandwidth, making it challenging to directly correlate with full-band airborne speech. Therefore, we exploit the occlusion effect to carry out theoretical modeling and quantitative analysis of this cross-channel correlation and study how to leverage such cross-channel correlation for speech enhancement. Specifically, we design a series of methodologies including data augmentation, deep learning-based fusion, and noise mixture scheme, to improve the generalization, effectiveness, and robustness of EarSpeech, respectively. Lastly, we conduct real-world experiments to evaluate the performance of our system. Specifically, EarSpeech achieves an average improvement ratio of 27.23% and 13.92% in terms of PESQ and STOI, respectively, and significantly improves SI-SDR by 8.91 dB. Benefiting from data augmentation, EarSpeech can achieve comparable performance with a small-scale dataset that is 40 times less than the original dataset. In addition, we validate the generalization of different users, speech content, and language types, respectively, as well as robustness in the real world via comprehensive experiments. The audio demo of EarSpeech is available on https://github.com/EarSpeech/earspeech.github.io/.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Han et al. (Thu,) studied this question.

synapsesocial.com/papers/68e5b4e9b6db64358754db90 — DOI: https://doi.org/10.1145/3678594

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

A multi-band spectral subtraction method for enhancing speech corrupted by colored noise· 2002 · 551 citations
Raw waveform-based speech enhancement by fully convolutional networks· 2017 · 213 citations
Relationship between phoneme-level spectral acoustics and speech intelligibility in healthy speech: a systematic review· 2021 · 17 citations
Quantifying the Causal Effect of Individual Mobility on Health Status in Urban Space· 2021 · 64 citations
Librispeech: An ASR corpus based on public domain audio books· 2015 · 5,976 citations

Authors

Feiyu Han

Nanjing University of Information Science and Technology

Panlong Yang

Nanjing University of Information Science and Technology

You Zuo

University of Science and Technology of China

Journals

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies

Actions

Institutions

University of Science and Technology of China

Nanjing University of Information Science and Technology

Suzhou University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

EarSpeech: Exploring In-Ear Occlusion Effect on Earphones for Data-efficient Airborne Speech Enhancement

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider