The growing prevalence of artificial intelligence has renewed attention on the role of data in training large language models (LLMs). For decades, digital libraries and repositories have focused on providing well-structured, searchable, and openly accessible information to the public. As a result, these systems have become major targets for large-scale AI data harvesting. The volume and intensity of automated access now place significant strain on technical infrastructure and on the people who maintain it, often exceeding the capacity intended to serve human users. In response, some institutions have limited access or taken systems offline, raising challenges to long-standing commitments to openness and public service. This panel addresses the operational, ethical, and strategic questions emerging from this reality. Drawing on the work of a cross-institutional working group, the session brings together diverse perspectives from roles involved in repository stewardship. Panelists will discuss how AI-driven harvesting affects daily operations, planning, and decision-making, and how responsibilities and constraints vary across roles, institutions, and legal contexts. By creating space for cross-role dialogue, the panel aims to advance discussion around mitigation, responsibility, and sustaining public mandates in an evolving, AI-driven internet.
Griffith et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: