What question did this study set out to answer?

This paper aims to enhance the metadata of literary corpora by integrating bibliographic resources with Wikidata.

March 1, 2026Open Access

Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora

Key Points

This paper aims to enhance the metadata of literary corpora by integrating bibliographic resources with Wikidata.
Conducted a feasibility analysis to establish a metadata integration workflow.
Utilized metadata from large literary collections like Project Gutenberg and HathiTrust.
Explored procedural approaches and software tools for integration.
Identified significant gaps in existing bibliographic metadata coverage.
Proposed workflows to enrich literary metadata using Wikidata.
Developed open-source methods for improving metadata availability on digital platforms.

Abstract

This paper presents a case study on enhancing literary-corpus metadata by integrating large-scale bibliographic resources with Wikidata. Digital libraries such as Project Gutenberg or HathiTrust often provide only minimal metadata (e.g., author name and title). For large-scale literary analysis, however, it is crucial to include additional information such as year of publication, author gender, genre, or publisher. Conversely, using Wikidata to enrich existing literary-corpus metadata is challenging, as significant gaps in coverage remain. In this case study, we draw on the metadata of a large literary corpus to address these gaps. We conduct a feasibility analysis to determine how a workflow can be established that integrates metadata from bibliographic catalogues into Wikidata as a step in the digital-humanities pipeline. We explore both procedural approaches and existing software tools and discuss resulting challenges and limitations. Our methods are documented and open-source; the full Python scripts and data processing workflows are publicly available on GitHub.1 The goal is to develop reproducible methods for sharing and improving metadata availability across open platforms.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Katrin Rohrbacher

Friedrich-Alexander-Universität Erlangen-Nürnberg

David Schrittesser

Friedrich-Alexander-Universität Erlangen-Nürnberg

Journals

Journal of Open Humanities Data

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Bridging the Gaps: Integrating Bibliographic Metadata Into Wikidata for Literary Corpora

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider