What question did this study set out to answer?

This research aims to introduce the GerFuN dataset and evaluate coreference resolution methods on German novels.

May 7, 2026Open Access

Coreference Resolution for Full German Novels using Large Language Models

Key Points

This research aims to introduce the GerFuN dataset and evaluate coreference resolution methods on German novels.
Developed the GerFuN dataset with fully annotated character coreference in five German novels.
Employed a semi-manual pipeline for annotation, including pre-annotation with a large language model.
Manual corrections were performed using the INCEpTION tool.
The annotation guidelines are refined and made more explicit for improved accuracy.
LLMs evaluated on GerFuN surpassed previous coreference resolution pipelines.
Achieved near-human accuracy in specific prototypical cases relevant to literary studies.

Abstract

The paper introduces the GerFuN dataset, consisting of five German-language novels fully annotated for character coreference, comprising a total of 450,000 tokens. Using a semi-manual pipeline, we first pre-annotated the novels using a LLM, and then manually corrected the annotations in the INCEpTION tool. The annotation guidelines, which build on existing approaches but are made more explicit and refined, are presented in the paper and released alongside the dataset. Finally, we evaluate LLMs on GerFuN, which surpass previous pipelines and exhibit near-human accuracy on prototypical cases of particular interest to literary studies.

Coreference Resolution for Full German Novels using Large Language Models

Key Points

Abstract

Cite This Study