March 18, 2024Open Access

A Cross Search Method for Data Augmentation in Neural Machine Translation

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Large language models (LLMs) have shown excellent performance on general machine translation. However, LLMs suffer from high deployment cost and unsatisfying quality on low-resource domains. To this end, we explore to build base translation models with LLM-enhanced data augmentation. For data augmentation, we propose a cross search method to obtain qualified parallel in-domain corpus. This method encompasses two distinct approaches: antagony-cross search and similarity-cross search. Antagony-cross search helps to generate monolingual data that closely aligns with the target domain by employing token-level control. Similarity-cross search keeps the alignment between source and target sentences through a similarity score in back translation, so that the generated target language is closer to the source language semantically. With the proposed method, we generate millions of high-quality parallel in-domain corpus from low-resource monolingual data. Our proposed method achieves improvements of approximately 0.5-4 BLEU scores in these domains.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Zhang et al. (Mon,) studied this question.

synapsesocial.com/papers/68e73894b6db6435876b1f3f https://doi.org/https://doi.org/10.1109/icassp48485.2024.10447171

Also Consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo