Mixup, an interpolation-based method that implicitly generates synthetic examples for training, has shown effectiveness in tasks such as image and text classification. Standard mixup randomly interpolates two samples of images and their labels. In this paper, we apply mixup to low-resource machine translation tasks by interpolating in the hidden space. We investigate the impact of different mixing coefficients on this technique. We also explore whether semantically related or unrelated samples provide more benefits for interpolation compared to random selection. To investigate this, we extend the standard mixup approach by selecting samples based on distance and experimenting with different sampling settings. Our experiments are conducted across several low-resource language pairs, including Lower Sorbian and Upper Sorbian, Lower Sorbian and German, and Upper Sorbian and German. Through systematic experiments on multiple language pairs, we evaluate the effectiveness of mixup data augmentation in improving low-resource machine translation performance. Our findings indicate that the standard mixup technique enhances the quality of machine translation, resulting in an average increase of 1.9 BLEU points over the baseline Transformer model. The choice of mixing coefficients has minimal impact on translation quality, which suggests that fine-tuning these coefficients is not essential to benefit from mixup. In addition, the standard mixup performs robustly, as selecting either the most similar or most dissimilar samples for mixing does not provide a significant improvement over it.
Zhou et al. (Fri,) studied this question.