Domain-specific machine translation (MT) significantly benefits from large language models (LLMs) due to their strong instruction-following abilities and in-context learning (ICL) capabilities. Appropriate demonstration samples and feedback are essential for helping LLMs refine their translation outputs in real-world applications. However, the scarcity of in-domain samples and professional feedback creates practical limitations. Furthermore, the current ICL paradigm does not offer the fine-grained domain features in addition to parallel translation pairs. To address these challenges, we propose a pipeline that collects in-domain translations from LLMs and generates synthetic, human-like feedback for revising these translations. The translations and their corresponding feedback are stored together to build a demonstration database, with each instance paired with the original in-domain translation and its revision. During online translation, similar in-domain translations can be retrieved as revision demonstrations. This process guides LLMs in iteratively refining their outputs by learning from demonstrations. We evaluate the proposed pipeline using open-source models like Llama3-8B-Instruct and Mistral-7B-Instruct-v0.3, on five domain-specific benchmarks for English-centric, Chinese-centric and Portuguese-centric translation. The results demonstrate the effectiveness of the pipeline in tailoring in-domain translations and improving translation performance compared to direct translation instructions. Additionally, we discuss the experimental results from the following perspectives: 1) the effectiveness of different in-context retrieval methods; 2) the observed differences across selected domains and language; 3) the quantitative analysis of sentence-level and word-level statistics; and 4) the effect of ICL retrieval database size and decoding parameters.
Yang et al. (Fri,) studied this question.