We propose a real-time speech processing system for hearing assistance applications, combining speech enhancement (SE) and voice conversion (VC) in a lightweight pipeline. The system utilizes a low-latency, deep-learning-free SE module to extract the target speaker’s voice from a multichannel signal observed in noisy, multi-speaker environments, followed by a VC module that transforms the extracted voice to improve intelligibility and listening comfort. The system is designed to work with a distributed assistive device equipped with multi-sensor input, including both-ear microphone arrays and additional microphones in a smartphone at hand. Motivated by the need to improve the intelligibility of a specific speaker in hearing aid scenarios, the system focuses on extracting and converting the target voice in challenging auditory scenes. Evaluation experiments simulating hearing aid use confirmed its real-time operability and revealed improvements in subjective listening comfort, including perceived vocal softness. Furthermore, we investigated how the parameter settings of the front-end SE affect the downstream VC performance and discussed its optimal configuration for the overall pipeline.
Seki et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: