Despite recent advances in brain-computer interfaces (BCIs) for speech restoration, existing systems remain invasive, costly, and inaccessible to individuals with congenital mutism or neurodegenerative disease. We present a proof-of-concept pipeline that synthesizes personalized speech directly from real-time magnetic resonance imaging (rtMRI) of the vocal tract, without requiring acoustic input. Segmented rtMRI frames are mapped to articulatory class representations using a Pix2Pix conditional GAN, which are then transformed into synthetic audio waveforms by a convolutional neural network modeling the articulatory-to-acoustic relationship. The outputs are rendered into audible form and evaluated with speaker-similarity metrics derived from Resemblyzer embeddings. While preliminary, our results suggest that even silent articulatory motion encodes sufficient information to approximate a speaker's vocal characteristics, offering a non-invasive direction for future speech restoration in individuals who have lost or never developed voice.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mahdi Saleh
University of Balamand
Building similarity graph...
Analyzing shared references across papers
Loading...
Mahdi Saleh (Tue,) studied this question.
www.synapsesocial.com/papers/68af620aad7bf08b1eae3103 — DOI: https://doi.org/10.1101/2025.08.22.25334256
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: