What question did this study set out to answer?

The main aim is to develop compact Romanian language models optimized for low-power devices and long-context inference.

February 8, 2026Open Access

Edge-Ready Romanian Language Models: Training, Quantization, and Deployment

Key Points

The main aim is to develop compact Romanian language models optimized for low-power devices and long-context inference.
Trained two Romanian language models from scratch using a 4.3 B-token corpus.
Utilized modern architecture including RMSNorm and rotary position embeddings.
Evaluated intrinsic and extrinsic performance using perplexity and classification tasks.
RoBaseLM models achieved a significant reduction in perplexity, from 30.7 to 15.9.
5-bit quantized models maintained performance comparable to FP16 versions.
Enabled real-time text generation on low-power devices like Jetson Nano.

Abstract

We present RoBaseLM-S (125 M) and RoBaseLM-M (260 M), two compact Romanian decoder-only language models trained from scratch on a 4. 3 B-token curated corpus. Architecturally, they follow a modern LLaMA-style recipe with pre-norm RMSNorm, rotary position embeddings, SwiGLU feed-forward blocks, grouped-query attention, and 4 k-token context windows. We release both full-precision (FP16) and post-training 5-bit (Q5KM) checkpoints in GGUF format for lightweight local inference. The 5-bit variants fit under 500 MB and generate text in real time on a Jetson Nano 4 GB, enabling fully offline Romanian text generation on consumer-grade edge hardware. We evaluate the models intrinsically (multi-domain perplexity across news, literary prose, poetry, and heterogeneous web text) and extrinsically (LaRoSeDa sentiment classification and RO-STS sentence similarity). Relative to Romanian GPT-2–style baselines at similar parameter scales, RoBaseLM-S and RoBaseLM-M reduce perplexity substantially, e. g. , from 30. 7 to 15. 9 on our held-out news split. The 5-bit post-training quantized checkpoints remain within FP16 performance across all reported tasks. To our knowledge, these are the first Romanian small language models explicitly optimized for long-context inference, post-training quantization, and low-power on-device deployment.

Bookmark

View Full Paper

Bookmark

View Full Paper

Edge-Ready Romanian Language Models: Training, Quantization, and Deployment

Key Points

Abstract

Cite This Study