September 1, 2024

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation

Key Points

Key points are not available for this paper at this time.

Abstract

Current research in speech-to-speech translation (S2ST) primarily concentrates on translation accuracy and speech naturalness, often overlooking key elements like paralinguistic information, which is essential for conveying emotions and attitudes in communication. To address this, our research introduces a novel, carefully curated multilingual dataset from various movie audio tracks. Each dataset pair is precisely matched for paralinguistic features and duration. We enhance this by integrating multiple prosody transfer techniques, aiming for translations that are accurate, natural-sounding, and rich in paralinguistic details. Our experimental results confirm that our model retains more paralinguistic information from the source speech while maintaining high standards of translation accuracy and naturalness.

Bookmark

Cite This Study

Min et al. (Sun,) studied this question.

synapsesocial.com/papers/68e59e8eb6db643587538966 https://doi.org/https://doi.org/10.21437/interspeech.2024-2548

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark