What question did this study set out to answer?

The research aims to assess the effects of quantization on the performance of Russian-language large language models (LLMs).

April 7, 2026

Exploring Posttraining Quantization of Large Language Models: An Efficiency Evaluation with a Focus on Russian-Language Tasks

Key Points

The research aims to assess the effects of quantization on the performance of Russian-language large language models (LLMs).
Conducted a systematic study of quantizing pretrained LLMs to 2.0–4.25 bits per parameter.
Examined models ranging from 4 billion to 32 billion parameters.
Included standard uniform quantization and specialized low-bit formats.
Quantization tolerance varies across different model architectures and sizes.
4-bit quantization shows high robustness, especially with advanced formats.
2-bit and 3-bit quantizations are sensitive to calibration data and scaling strategies.

Abstract

Quantization has become a key technique for the compression and acceleration of large language models (LLMs). Although research into low-bit quantization is actively advancing for English-language LLMs, its impact on morphologically rich and resource-diverse languages, including Russian, remains far less studied. Therefore, additional research into this problem is required, driven by the development of high-performance Russian-language and multilingual LLMs. We have conducted a systematic study of quantizing pretrained models to 2.0–4.25 bits per parameter for modern Russian-language LLMs at various scales, ranging from 4 to 32 billion parameters (4B and 32B). Our experimental setup covers both standard uniform quantization and specialized low-bit formats. Our findings highlight several key trends: (i) the tolerance of Russian-language LLMs to quantization varies across model architectures and sizes; (ii) 4-bit quantization demonstrates high robustness, particularly when advanced formats are employed; (iii) 3-bit and 2-bit quantizations prove to be the most sensitive to calibration data and scaling strategies. Empirical results show that the model’s domain must be considered when employing different quantization techniques.

Bookmark

Cite This Study

Poimanov et al. (Mon,) studied this question.

synapsesocial.com/papers/69d49ecbb33cc4c35a2278b8 https://doi.org/https://doi.org/10.3103/s0005105525701389

Bookmark