We present RoBaseLM-S (125 M) and RoBaseLM-M (260 M), two compact Romanian decoder-only language models trained from scratch on a 4. 3 B-token curated corpus. Architecturally, they follow a modern LLaMA-style recipe with pre-norm RMSNorm, rotary position embeddings, SwiGLU feed-forward blocks, grouped-query attention, and 4 k-token context windows. We release both full-precision (FP16) and post-training 5-bit (Q5KM) checkpoints in GGUF format for lightweight local inference. The 5-bit variants fit under 500 MB and generate text in real time on a Jetson Nano 4 GB, enabling fully offline Romanian text generation on consumer-grade edge hardware. We evaluate the models intrinsically (multi-domain perplexity across news, literary prose, poetry, and heterogeneous web text) and extrinsically (LaRoSeDa sentiment classification and RO-STS sentence similarity). Relative to Romanian GPT-2–style baselines at similar parameter scales, RoBaseLM-S and RoBaseLM-M reduce perplexity substantially, e. g. , from 30. 7 to 15. 9 on our held-out news split. The 5-bit post-training quantized checkpoints remain within FP16 performance across all reported tasks. To our knowledge, these are the first Romanian small language models explicitly optimized for long-context inference, post-training quantization, and low-power on-device deployment.
Diac et al. (Fri,) studied this question.