Institutional repositories increasingly depend on external scholarly data sources to improve coverage, timeliness, and metadata quality. However, tight coupling between harvesting logic and repository platforms often introduces operational risk, complicates upgrades, and limits sustainability. This presentation describes the design and implementation of an independent crawler microservice developed for the Imec institutional repository. The service harvests publication metadata from Web of Science and Crossref, performs deduplication and controlled metadata merging, and communicates with DSpace exclusively through its REST API. By fully decoupling crawling and enrichment logic from the repository core, the solution enables independent scaling, configuration, and failure isolation, while remaining upgrade-safe across DSpace versions. We will present the crawler's architecture, deployment as a Linux service, incremental harvesting strategy, DOI-based deduplication, and a transparent metadata precedence model balancing licensed and open sources. The approach directly supports FAIR principles by improving findability, interoperability, and machine-actionability, while reducing long-term maintenance and preservation risk. The session concludes with lessons learned, design trade-offs, and recommendations for repository developers seeking resilient, future-proof integrations with emerging scholarly infrastructure.
Wawer et al. (Mon,) studied this question.