What question did this study set out to answer?

This research examines how large language models achieve translation abilities without traditional training. It explores local and global learning concepts in translation.

December 8, 2025Open Access

Translation in the Wild

Key Points

This research examines how large language models achieve translation abilities without traditional training. It explores local and global learning concepts in translation.
Analysis of large language models' training processes and data integration
Discussion of bilingual signals and context windows
Evaluation of empirical testing strategies for the duality hypothesis
Large language models demonstrate effective translation capabilities despite not being specifically trained for translation
Local learning utilizes immediate bilingual signals while global learning leverages broader monolingual data
The duality hypothesis suggests a continuous interaction between local and global learning enhances translation.

Abstract

Large Language Models (LLMs) excel in translation, among other things, demonstrating competitive performance for many language pairs in zero- and few-shot settings. But unlike dedicated neural machine translation models, LLMs are not trained on any translation-related objective. What explains their remarkable translation abilities? Are these abilities grounded in “incidental bilingualism” in training data? Does instruction tuning contribute to it? Are LLMs capable of aligning and leveraging semantically identical or similar monolingual contents from different corners of the internet that are unlikely to fit in a single context window? I offer some reflections on this topic, informed by recent studies and growing user experience. My working hypothesis is that LLMs’ translation abilities originate in two different types of pre-training data that may be internalized by the models in different ways: Local and Global. “Local learning” makes use of bilingual signals present within a single training context window (e.g., an English sentence soon followed by its Chinese translation in the training data). “Global learning,” in contrast, capitalizes on mining semantically related monolingual contents that are spread out over the LLMs’ pre-training data. The key to explaining the origins of LLMs’ translation capabilities is a continuous iteration between Local and Global learning, which is a natural and helpful consequence of batch training. I discuss the prospects for testing the “duality hypothesis” empirically and its implications for reconceptualizing translation, human and machine, in the age of deep learning.

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper