March 3, 2026

Zaznava protislovij v pravnih besedilih

Key Points

Legal language models can effectively address data scarcity in low-resource languages, allowing for better legal analysis.
The construction of a large Slovene legal corpus enables advanced language processing in the legal field.
PravniBERT achieved 83.6% accuracy on the legal article retrieval task, highlighting its effectiveness.
The application of LLMs in legal contexts opens opportunities for transparent and adapted legal AI systems.

Abstract

This paper lays the foundation for using Large Language Models (LLMs) in the Slovenian legal domain. We address data scarcity in low-resource languages by constructing the largest publicly available Slovene Legal Corpus, spanning over one billion tokens from legislative, judicial, and governmental texts. We introduce PravniBERT, a domain-specific Slovene legal language model, and evaluate it on contradiction-based legal article retrieval, achieving 83.6% ac-curacy@3. Our results demonstrate the feasibility of applying LLMs to complex legal reasoning in under-resourced settings and highlight the potential for transparent, domain-adapted legal AI in Slovenia.

Zaznava protislovij v pravnih besedilih

Key Points

Abstract

Cite This Study