What question did this study set out to answer?

The aim is to enhance speech-to-text translation by incorporating emotional context and generative error correction methods.

June 17, 2026Open Access

Emotion-aware Speech Translation Correction with Large Language Models

Key Points

The aim is to enhance speech-to-text translation by incorporating emotional context and generative error correction methods.
Finetune a large language model based on N-best hypotheses for improved translation.
Integrate emotion and sentiment labels into the finetuning process.
Employ Describe-then-Translate as a supervision technique for predicting emotions before translation.
Combining generative error correction with emotion/sentiment labels significantly improves translation accuracy.
Describe-then-Translate outperforms label-only emotion supervision, indicating better emotional contextualization.

Abstract

We study emotion-aware speech-to-text translation (ST) through the lens of generative error correction (GER) with large language models (LLMs). First, we enhance the translation of emotional speech by adopting the GER paradigm: Finetune an LLM to generate the translation based on the decoded N-best hypotheses. Next, we combine the emotion and sentiment labels into the LLM finetuning process to enable the model to consider the emotion content. Moreover, we introduce Describe-then-Translate (DtT), a simple yet effective rationale-style supervision that makes the GER model first predict the emotion with a brief natural-language sentence and then generate the translation, which aligns with the LLM’s language-modeling objective. We conducted experiments on the English-Chinese BMELD dataset with different N-best generation strategies and GER models. The results show the effectiveness of the combination of GER and emotion/sentiment labels, and DtT improves over label-only emotion supervision.

Bookmark

View Full Paper

Bookmark

View Full Paper

Emotion-aware Speech Translation Correction with Large Language Models

Key Points

Abstract

Cite This Study