Key points are not available for this paper at this time.
Can a machine learning model infringe on a copyright—do machine learning models store protected content? This work-in-progress law review Article focuses on empirical data developed, in part, to answer that question: yes. A set of unconditional image generators, diffusion models (n = 14), are trained on small slices of a dataset consisting of celebrities’ faces. The synthetic data output from these generators is then compared to training data using a variety of similarity metrics. As the empirical data shows, the question is not can models contain copyrighted works, but do models contain copyright works. In some cases, there is a 99% chance that a model will generate an image nearly identical to its training data; in other cases, even after 10,000 generations, a model does not produce any images that may be considered identical (though finding similarity is nonetheless possible). This Article uses the empirical data to argue for a series of duties to be placed on model owners.
Nathan Reitinger (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: