Large Language Models (LLMs) excel in translation, among other things, demonstrating competitive performance for many language pairs in zero- and few-shot settings. But unlike dedicated neural machine translation models, LLMs are not trained on any translation-related objective. What explains their remarkable translation abilities? Are these abilities grounded in “incidental bilingualism” in training data? Does instruction tuning contribute to it? Are LLMs capable of aligning and leveraging semantically identical or similar monolingual contents from different corners of the internet that are unlikely to fit in a single context window? I offer some reflections on this topic, informed by recent studies and growing user experience. My working hypothesis is that LLMs’ translation abilities originate in two different types of pre-training data that may be internalized by the models in different ways: Local and Global. “Local learning” makes use of bilingual signals present within a single training context window (e.g., an English sentence soon followed by its Chinese translation in the training data). “Global learning,” in contrast, capitalizes on mining semantically related monolingual contents that are spread out over the LLMs’ pre-training data. The key to explaining the origins of LLMs’ translation capabilities is a continuous iteration between Local and Global learning, which is a natural and helpful consequence of batch training. I discuss the prospects for testing the “duality hypothesis” empirically and its implications for reconceptualizing translation, human and machine, in the age of deep learning.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuri Balashov
Information
University of Georgia
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuri Balashov (Thu,) studied this question.
www.synapsesocial.com/papers/693624c34fa91c937236ccad — DOI: https://doi.org/10.3390/info16121077