The rapid development of generative artificial intelligence (AI) and its expanding commercial applications have intensified the demand for large-scale, high-quality datasets, leading to frequent copyright disputes. These conflicts reveal the limitations of the current copyright framework in addressing the legality of text and data mining (TDM) during model training. Focusing on the use of commercially non-public, copyright-protected content in the training phase, this paper analyzes the tension between technical necessity and the legal constraints arising from the expansion of reproduction and adaptation rights. It examines the shortcomings of existing Chinese regulationsincluding the Administrative Measures for Generative Artificial Intelligence Servicesin providing an operable and scalable authorization mechanism. Drawing on the historical functions of copyright and a comparative analysis of TDM exemption regimes in the EU and Japan, the study proposes a Chinese model based on a statutory licensing regime tailored to TDM. The proposed legal framework employs a semi-open legislative technique, featuring a general clauseillustrative listresidual provision structure, and is supported by the establishment of a new type of copyright management organization endowed with public authorization. This institution would oversee registration, remuneration distribution, and rights verification, ensuring lawful usage without prior consent while enabling equitable compensation. The proposed system aims to overcome licensing inefficiencies, reduce transaction costs, and promote coordinated development between copyright protection and data governance in the digital economy.
Yifan Xue (Wed,) studied this question.