What question did this study set out to answer?

To create a comprehensive multimodal dataset of articulatory physiological data for Mandarin Chinese using ultrasound imaging.

June 18, 2026Open Access

A dataset of multimodal articulatory physiological data for Mandarin Chinese based on ultrasound tongue imaging

Puntos clave

To create a comprehensive multimodal dataset of articulatory physiological data for Mandarin Chinese using ultrasound imaging.
Dataset includes 1,024 pronunciation units covering various syllables and tone conditions.
Data sources involve text corpora, speech audio, lip video, and ultrasound imaging for a comprehensive view.
Manual and machine screening ensured high-quality data by removing non-standard and invalid samples.
Dataset supports physiological mechanism research with enhanced data quality for Mandarin pronunciation.
Applications extend to speech synthesis, recognition, and the training of AI speech models.
Provides a reference for studies on pronunciation physiology across different languages.

Resumen

Articulatory physiological data are the core foundation of Mandarin Chinese phonetic research and speech engineering. At present, the multimodal pronunciation physiological datasets for Mandarin Chinese have several limitations, including incomplete coverage, single-modality acquisition, and lack of synchronization, which are difficult to meet the requirements of high precise research. To address this issue, this study constructs a multimodal pronunciation physiological dataset of Mandarin Chinese based on ultrasound tongue imaging, thereby addressing the deficiency of existing datasets in the fusion of multi-dimensional pronunciation physiological information. The dataset covers commonly used valid syllable units formed by combinations of initials and finals under four tone conditions, forming 1,024 complete pronunciation units. Multimodal data consists of four parts: text corpora, speech audio, lip video, and ultrasound tongue imaging, which can comprehensively reflect the physiological movement characteristics and acoustic performance during the pronunciation process. In the data quality control stage, a combination of manual verification and machine screening is adopted to eliminate invalid data such as non-standard pronunciation, blurry images, and audio distortion, ultimately ensuring a high-quality dataset. The dataset not only provides data support for basic research on the physiological mechanism of Mandarin Chinese pronunciation, the rules of tone changes, and second language acquisition, but also has applications in speech synthesis and recognition, diagnosis and rehabilitation of speech disorders, modeling of pronunciation mechanisms, and training of artificial intelligence speech models. At the same time, it offers a reference for cross-language comparative studies on pronunciation physiology.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo