What question did this study set out to answer?

The research aims to create a hybrid framework for analyzing Indonesian court decisions to improve how sentencing information is extracted and understood.

April 15, 2026Open Access

From unstructured text to structured reasoning: a hybrid knowledge graph for Indonesian sentencing analysis

Key Points

The research aims to create a hybrid framework for analyzing Indonesian court decisions to improve how sentencing information is extracted and understood.
Developed a hybrid framework combining rule-based extraction and BERT-based pipeline.
Annotated 200 court decisions manually with the help of seven legal experts.
Filtered relevance and extracted entities from a dataset of 9,109 narcotics and corruption cases.
Employed fine-tuned LegalBERT for entity extraction.
Achieved an average F1-score of 84.39% across the legal domains, with 86.1% for corruption and 82.3% for narcotics cases.
The knowledge graph enabled sentence length prediction with 85% accuracy, significantly better than conventional text analysis baselines.
Demonstrated a performance gap suggesting stronger patterns captured by structured representations than unstructured text.

Abstract

Indonesian court decisions contain rich legal knowledge about how judges interpret statutes, assess evidence, and determine sentences, yet much of this remains hidden in semi-structured, inconsistent documents. Rule-based systems fail on lengthy narratives, while machine learning models overfit to noise, and large language models risk factual errors. This study introduces a hybrid framework that combines rule-based extraction for structured sections with a BERT-based pipeline for narrative text. This framework was developed using a dataset of 9,109 narcotics and corruption cases; from this corpus, seven legal experts manually annotated 200 decisions. A relevance filter cleaned the text before entity extraction using fine-tuned LegalBERT. The entity extraction pipeline achieved an average F1-score of 84.39% across both domains (86.1% for corruption cases and 82.3% for narcotics cases), and the resulting knowledge graph enabled sentence length prediction with 85% accuracy, i.e., a 56.9% point improvement over full-text baselines that reached only 26–28% despite 95% training accuracy. This notable performance gap suggests that structured graph representations may better capture legally meaningful patterns that unstructured text alone tends to miss. Our study introduces what we believe is the first legal knowledge graph of Indonesian court decisions.

Bookmark

View Full Paper

Bookmark

View Full Paper

From unstructured text to structured reasoning: a hybrid knowledge graph for Indonesian sentencing analysis

Key Points

Abstract

Cite This Study