Abstract This article addresses the legal tensions between artificial intelligence (AI) development and copyright law, exploring policymaking on the use of copyrighted data for AI training at the input level and the generation of AI content at the output level. Currently, global policy responses have heavily focused on the input level—whether AI can lawfully incorporate copyrighted data for AI training purposes. Jurisdictions such as the EU, the UK, the US, China and Japan adopt varied approaches. By comparing and examining different policy versions, this article proposes shifting the focus from input restrictions to output regulation, a policy strategy referred to as ‘input out, output in’. It suggests that AI training should generally be lawful, while regulatory guardrails should apply to outputs that may compete directly with copyrighted works and deprive rightsholders of their deserved revenues. To harmonize the relationship between copyright holders and AI developers, key policy tools may include promoting transformative use, proper quotation and attribution, a Creative Commons-style framework and the safe harbour mechanism. This output-focused approach seeks to create positive-sum outcomes for copyright holders, AI developers and public information consumers. By ensuring free access to training data while moderating AI-generated content, the proposal supports innovation, protects creators’ interests and enhances public access to quality information, ultimately promoting a balanced and sustainable legal framework.
Jiawei Zhang (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: