Modeling articulatory representations is critical to the scientific study of speech production, including its relation to speech acoustics. However, discretizing articulatory dynamics in continuous speech has proven computationally taxing. For example, segmentation analyses of real-time vocal tract images deploying contour-tracking methods, while successful, require manual creation of templates and human supervised assessment e.g., Bresch and Narayanan (2009). IEEE Trans. Med. Imaging. 28(3), 323–338. In this paper, we utilize Segment Anything Model 2 (SAM 2.0) Ravi et al. (2024). arXiv:2408.00714 to efficiently segment critical articulators in real-time magnetic resonance imaging speech production data without fine-tuning and with global nonlinear image filtering to examine such systems' ability to segment speech dynamics, which have both language- and subject-specific characteristics.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haley Hsu
Kyle Kai Ho Ng
Sultana A. Qureshi
JASA Express Letters
University of Southern California
Building similarity graph...
Analyzing shared references across papers
Loading...
Hsu et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69aa7160531e4c4a9ff5b734 — DOI: https://doi.org/10.1121/10.0042820