Key points are not available for this paper at this time.
In Transformer-based Speech-to-Text (S2T) translation, an encoder-decoder model is trained end-to-end to take as input an untranscribed acoustic signal in the source language and directly generate a text translation in the target language. S2T translation models can also be trained in multilingual setups where a single front-end speech encoder is shared across multiple languages. A lingering question, however, is whether the encoder represents spoken utterances in a language-neutral space. In this paper, we present an interpretability study of encoder representations in a multilingual speech translation Transformer via various probing tasks. Our main findings show that while encoder representations are not entirely language-neutral, there exists a semantic subspace that is shared across different languages. Furthermore, we discuss our findings and the implication of our study on cross-lingual learning for spoken language understanding tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Abdullah et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e59e8eb6db64358753897b — DOI: https://doi.org/10.21437/interspeech.2024-2109
Badr M. Abdullah
Mohammed Maqsood Shaik
Dietrich Klakow
Building similarity graph...
Analyzing shared references across papers
Loading...