This paper has been accepted for presentation at Speech Prosody 2026. ABSTRACT: Although automatic methods of prosodic segmentation have re- cently been proposed, their effect on the construction of datasets for TTS training is still unknown. For the first time in Brazilian Portuguese, we investigated this type of effect on the CORAA NURC-SP Minimal Corpus dataset, consisting of ≈ 17h35m of spontaneous speech, which was segmented using an auto- matic prosodic segmenter to train a speech synthesis model. We comparatively analyzed natural speech and speech synthesized by FastSpeech2 under three segmentation conditions: manual prosodic segmentation, WhisperX automatic segmentation, and a machine learning prosodic segmentation method. The results of the acoustic prosodic analysis revealed that speech synthe- sized from a dataset with automatic prosodic segmentation ap- proximates speech generated with manually segmented data, considering the representation of the F0 curve. Nevertheless, in a phonological analysis, synthetic speech exhibited a higher variability in tonal events and prosodic focus, as was also ob- served by Hu et al. (2024) for Southern British English. Fur- thermore, 70% of synthesized nuclear contours differed from the nuclear contours of natural speech. We attribute these is- sues, among other factors, to the fact that automatic segmenta- tion does not capture systematically pauses and F0 variations, which delimit intonational units, unlike manual segmentation.
Galdino et al. (Mon,) studied this question.