What question did this study set out to answer?

This research aims to automate the segmentation of vocal tract images using advanced modeling techniques.

March 6, 2026Open Access

Automation of real-time vocal tract image segmentation with SAM 2.0 and morphological operation implementation

Key Points

This research aims to automate the segmentation of vocal tract images using advanced modeling techniques.
Applied Segment Anything Model 2 (SAM 2.0) for image segmentation
Analyzed real-time magnetic resonance imaging data
Utilized global nonlinear image filtering techniques
Achieved efficient segmentation of articulators without manual template creation
Demonstrated potential for real-time analysis of speech dynamics
Addressed challenges associated with language- and subject-specific variations

Abstract

Modeling articulatory representations is critical to the scientific study of speech production, including its relation to speech acoustics. However, discretizing articulatory dynamics in continuous speech has proven computationally taxing. For example, segmentation analyses of real-time vocal tract images deploying contour-tracking methods, while successful, require manual creation of templates and human supervised assessment e.g., Bresch and Narayanan (2009). IEEE Trans. Med. Imaging. 28(3), 323–338. In this paper, we utilize Segment Anything Model 2 (SAM 2.0) Ravi et al. (2024). arXiv:2408.00714 to efficiently segment critical articulators in real-time magnetic resonance imaging speech production data without fine-tuning and with global nonlinear image filtering to examine such systems' ability to segment speech dynamics, which have both language- and subject-specific characteristics.

Read Full Paperexternally

Perguntar à IA

Bookmark

View Full Paper