This paper lays the foundation for using Large Language Models (LLMs) in the Slovenian legal domain. We address data scarcity in low-resource languages by constructing the largest publicly available Slovene Legal Corpus, spanning over one billion tokens from legislative, judicial, and governmental texts. We introduce PravniBERT, a domain-specific Slovene legal language model, and evaluate it on contradiction-based legal article retrieval, achieving 83.6% ac-curacy@3. Our results demonstrate the feasibility of applying LLMs to complex legal reasoning in under-resourced settings and highlight the potential for transparent, domain-adapted legal AI in Slovenia.
Malenšek et al. (Tue,) studied this question.