Indonesian court decisions contain rich legal knowledge about how judges interpret statutes, assess evidence, and determine sentences, yet much of this remains hidden in semi-structured, inconsistent documents. Rule-based systems fail on lengthy narratives, while machine learning models overfit to noise, and large language models risk factual errors. This study introduces a hybrid framework that combines rule-based extraction for structured sections with a BERT-based pipeline for narrative text. This framework was developed using a dataset of 9,109 narcotics and corruption cases; from this corpus, seven legal experts manually annotated 200 decisions. A relevance filter cleaned the text before entity extraction using fine-tuned LegalBERT. The entity extraction pipeline achieved an average F1-score of 84.39% across both domains (86.1% for corruption cases and 82.3% for narcotics cases), and the resulting knowledge graph enabled sentence length prediction with 85% accuracy, i.e., a 56.9% point improvement over full-text baselines that reached only 26–28% despite 95% training accuracy. This notable performance gap suggests that structured graph representations may better capture legally meaningful patterns that unstructured text alone tends to miss. Our study introduces what we believe is the first legal knowledge graph of Indonesian court decisions.
Hairurahman et al. (Mon,) studied this question.