The application of Electrolarynx (EL) for tonal language laryngectomees remains challenging due to the difficulty in achieving tonal completion without useful fundamental frequency (F0) information. This study proposes a novel Mandarin EL speech enhancement framework by integrating the prior F0 information provided by finger movements, combined with the Cycle-Consistent Adversarial Network (CycleGAN) and Continuous Wavelet Transform (CWT). For prosody modeling, we exploit the hierarchical structure inherent in Mandarin prosody by using CWT decomposition coefficients as a feature representation of F0. For spectral conversion, we extract Mel-frequency cepstral coefficients (MCEP) as spectral features. These two feature sets were trained separately using the CycleGAN model. In results, acoustic feature analysis indicates that the four tones after converted are closer to normal tones in both F0 value and F0 contour. The spectrogram of the converted speech is also more similar to that of normal speech, and compensates for low-frequency energy missing below 500 Hz. Both subjective and objective evaluations demonstrate the effectiveness of the proposed method in Mandarin EL speech enhancement. This study also provides a novel approach for EL speech enhancement in other tonal languages. And it may provide valuable insights and guidance for future improvement in tonal EL devices development and EL speech enhancement.
Zhou et al. (Fri,) studied this question.