This paper presents a case study on enhancing literary-corpus metadata by integrating large-scale bibliographic resources with Wikidata. Digital libraries such as Project Gutenberg or HathiTrust often provide only minimal metadata (e.g., author name and title). For large-scale literary analysis, however, it is crucial to include additional information such as year of publication, author gender, genre, or publisher. Conversely, using Wikidata to enrich existing literary-corpus metadata is challenging, as significant gaps in coverage remain. In this case study, we draw on the metadata of a large literary corpus to address these gaps. We conduct a feasibility analysis to determine how a workflow can be established that integrates metadata from bibliographic catalogues into Wikidata as a step in the digital-humanities pipeline. We explore both procedural approaches and existing software tools and discuss resulting challenges and limitations. Our methods are documented and open-source; the full Python scripts and data processing workflows are publicly available on GitHub.1 The goal is to develop reproducible methods for sharing and improving metadata availability across open platforms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Katrin Rohrbacher
Friedrich-Alexander-Universität Erlangen-Nürnberg
David Schrittesser
Friedrich-Alexander-Universität Erlangen-Nürnberg
Journal of Open Humanities Data
Building similarity graph...
Analyzing shared references across papers
Loading...
Rohrbacher et al. (Thu,) studied this question.
synapsesocial.com/papers/69a3ddf3ec16d51705d3055a — DOI: https://doi.org/10.5334/johd.483
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: