What question did this study set out to answer?

March 15, 2026Open Access

Theft Address Extraction and Classification from Chinese Judicial Documents Based on Large Language Model

Key Points

The aim is to extract and classify crime addresses from Chinese judicial documents using a fine-tuned language model.
Utilized a fine-tuned LLM named CAEC_LLM for address extraction and classification.
Developed a structured prompt engineering strategy for the task.
Evaluated the model's performance using metrics like F1-score and classification accuracy.
Achieved an F1-score of 0.79 for address extraction.
Obtained a classification accuracy of up to 0.74 for the best-performing category.
Demonstrated significant performance improvement over other large language models.

Abstract

Judicial documents have become a significant data source for crime geography research, offering advantages in accessibility and scale compared to highly restricted police-recorded crime data. However, extracting crime addresses from these texts is challenging due to sparse, inconsistent, and incomplete address information. Without proper classification, errors in geocoding and spatial analysis can arise, compromising data quality. To address these limitations, we employed large language models (LLMs) and a structured prompt engineering strategy tailored for this task. Specifically, we propose a fine-tuned LLM, named CAECLLM, to extract addresses from judicial documents and classify these crime addresses at various categories with different spatial scales. Experimental results demonstrate that the model achieved an F1-score of 0. 79 for address extraction and a classification accuracy of up to 0. 74 for the best-performing category, significantly outperforming other LLMs. This study makes two primary contributions: (1) designing an address classification scheme specifically for crime addresses, and (2) developing a fine-tuned LLM for extracting and classifying crime addresses from Chinese judicial documents, enabling LLMs to be used to classify crime addresses into different categories on a spatial scale. These advancements facilitate more accurate crime pattern analysis and data-driven urban planning.

Theft Address Extraction and Classification from Chinese Judicial Documents Based on Large Language Model

Key Points

Abstract

Cite This Study