What question did this study set out to answer?

To develop a theoretical framework for multilingual adaptation of language models in the Turkic language family.

March 15, 2026Open Access

Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family:\\ A Theoretical Framework for Low-Resource Language Models

Key Points

To develop a theoretical framework for multilingual adaptation of language models in the Turkic language family.
Focused on Azerbaijani, Kazakh, Uzbek, Turkmen, and Gagauz languages.
Examined the adaptation of large language models using parameter-efficient techniques.
Introduced the Turkic Transfer Coefficient to evaluate linguistic similarities.
Identified how syntactic structure influences multilingual adaptation performance.
Proposed a model connecting adaptation performance with model capacity and adaptation data size.

Abstract

Cross-lingual transfer and multilingual adaptation remain key challenges for large language models (LLMs), particularly for low-resource languages. While modern multilingual models support many languages, their performance varies significantly depending on the availability of training data and linguistic similarity between languages. This work develops a theoretical framework for parameter-efficient adaptation of LLMs within the Turkic language family, focusing on Azerbaijani, Kazakh, Uzbek, Turkmen, and Gagauz. These languages share strong typological and morphological similarities while differing considerably in digital resource availability, providing a useful environment for studying multilingual transfer. We integrate insights from multilingual representation learning and parameter-efficient fine-tuning techniques such as Low-Rank Adaptation (LoRA). A conceptual scaling model is proposed to describe how adaptation performance depends on model capacity, adaptation data size, and the expressivity of parameter-efficient modules. To formalize transfer potential within the Turkic language family, we introduce the Turkic Transfer Coefficient (TTC), a theoretical measure incorporating morphological similarity, lexical overlap, syntactic structure, and script compatibility. This framework highlights how linguistic similarity can facilitate efficient multilingual adaptation while also identifying structural limits in extremely low-resource settings. arXiv preprint: Keywords: large language models, multilingual NLP, cross-lingual transfer, parameter-efficient fine-tuning, LoRA, Turkic languages, low-resource NLP

Read Full Paperexternally

Ask AI

Helpful

Bookmark

View Full Paper