What question did this study set out to answer?

The main aim is to develop compact Romanian language models optimized for low-power devices and long-context inference.

February 8, 2026Open Access

Edge-Ready Romanian Language Models: Training, Quantization, and Deployment

Key Points

The main aim is to develop compact Romanian language models optimized for low-power devices and long-context inference.
Trained two Romanian language models from scratch using a 4.3 B-token corpus.
Utilized modern architecture including RMSNorm and rotary position embeddings.
Evaluated intrinsic and extrinsic performance using perplexity and classification tasks.
RoBaseLM models achieved a significant reduction in perplexity, from 30.7 to 15.9.
5-bit quantized models maintained performance comparable to FP16 versions.
Enabled real-time text generation on low-power devices like Jetson Nano.

Abstract

We present RoBaseLM-S (125 M) and RoBaseLM-M (260 M), two compact Romanian decoder-only language models trained from scratch on a 4. 3 B-token curated corpus. Architecturally, they follow a modern LLaMA-style recipe with pre-norm RMSNorm, rotary position embeddings, SwiGLU feed-forward blocks, grouped-query attention, and 4 k-token context windows. We release both full-precision (FP16) and post-training 5-bit (Q5KM) checkpoints in GGUF format for lightweight local inference. The 5-bit variants fit under 500 MB and generate text in real time on a Jetson Nano 4 GB, enabling fully offline Romanian text generation on consumer-grade edge hardware. We evaluate the models intrinsically (multi-domain perplexity across news, literary prose, poetry, and heterogeneous web text) and extrinsically (LaRoSeDa sentiment classification and RO-STS sentence similarity). Relative to Romanian GPT-2–style baselines at similar parameter scales, RoBaseLM-S and RoBaseLM-M reduce perplexity substantially, e. g. , from 30. 7 to 15. 9 on our held-out news split. The 5-bit post-training quantized checkpoints remain within FP16 performance across all reported tasks. To our knowledge, these are the first Romanian small language models explicitly optimized for long-context inference, post-training quantization, and low-power on-device deployment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

T. A. Diac

Lund University

P. F. de Viana

Lund University

A. F. Neagoe

University of Bucharest

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Edge-Ready Romanian Language Models: Training, Quantization, and Deployment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study