Key points are not available for this paper at this time.
To achieve high-quality translation with low latency, a Simultaneous Speech Translation (SimulST) system relies on a policy module to decide whether to translate immediately or wait for additional streaming input, along with a translation model capable of effectively handling partial speech input. Prior research has tackled these components separately, either using ``wait-k'' policies based on fixed-length segments or detected word boundaries, or dynamic policies based on different strategies (e.g., meaningful units), while employing offline models for prefix-to-prefix translation. In this paper, we propose Divergence-Guided Simultaneous Speech Translation (DiG-SST), a tightly integrated approach focusing on both translation quality and latency for streaming input. Specifically, we introduce a simple yet effective prefix-based strategy for training translation models with partial speech input, and develop an adaptive policy that makes read/write decisions for the translation model based on the expected divergence in translation distributions resulting from future input. Our experiments on multiple translation directions of the MuST-C benchmark demonstrate that our approach achieves a better trade-off between translation quality and latency compared to existing methods.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e72968b6db6435876a37ea — DOI: https://doi.org/10.1609/aaai.v38i16.29733
Xinjie Chen
Kai Fan
Wei Luo
Zhejiang University
South China University of Technology
Alibaba Group (Cayman Islands)
Building similarity graph...
Analyzing shared references across papers
Loading...