Los puntos clave no están disponibles para este artículo en este momento.
Large language models (LLMs) have shown excellent performance on general machine translation. However, LLMs suffer from high deployment cost and unsatisfying quality on low-resource domains. To this end, we explore to build base translation models with LLM-enhanced data augmentation. For data augmentation, we propose a cross search method to obtain qualified parallel in-domain corpus. This method encompasses two distinct approaches: antagony-cross search and similarity-cross search. Antagony-cross search helps to generate monolingual data that closely aligns with the target domain by employing token-level control. Similarity-cross search keeps the alignment between source and target sentences through a similarity score in back translation, so that the generated target language is closer to the source language semantically. With the proposed method, we generate millions of high-quality parallel in-domain corpus from low-resource monolingual data. Our proposed method achieves improvements of approximately 0.5-4 BLEU scores in these domains.
Zhang et al. (Mon,) studied this question.
Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context: