This study investigates how first language (L1) phonological systems affect the stability of articulatory-to-acoustic inversion (AAI) in second language (L2) English speech using a speech foundation model-based approach. We leverage an AAI system built on WavLM-large, pretrained on 94 000 h of English audio from diverse domains and further trained to predict articulatory trajectories using electromagnetic articulography data from a native English speaker. This supervision enables the model to approximate vocal tract movements but encodes English L1 articulatory priors, limiting generalization to diverse L2 backgrounds. We hypothesize that speakers of languages with rhythmic structures and segmental inventories similar to English will exhibit more stable AAI, while speakers of more divergent L1s will show greater trajectory mismatch. Inversion performance was evaluated using a round-trip resynthesis procedure comparing inferred articulatory trajectories before and after resynthesis, using two publicly available corpora (L2-ARCTIC & CMU ARCTIC). Results show systematic variation across L1s. Speakers of Germanic languages (English varieties, German) tend to yield more stable inversion, while speakers of syllable-timed (Spanish, Korean), tonal (Mandarin, Vietnamese), or laryngeally complex (Arabic, Hebrew, Indian varieties) languages show greater mismatch. Our findings offer evidence of L1-driven articulatory biases, highlighting the need for typologically informed approaches to articulatory supervision. Work supported by IARPA.
Lee et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: